zjunlp
/

InstructCell-instruct

Model card Files Files and versions

Yin Fang commited on Jan 4

Commit

a2293d2

·

verified ·

1 Parent(s): 6985076

Update README.md

Files changed (1) hide show

README.md +58 -1

README.md CHANGED Viewed

@@ -1,3 +1,60 @@
 ---
 license: apache-2.0
----

 ---
 license: apache-2.0
+---
+## 🗞️ Model description
+**InstructCell** is a multi-modal AI copilot that integrates natural language with single-cell RNA sequencing data, enabling researchers to perform tasks like cell type annotation, pseudo-cell generation, and drug sensitivity prediction through intuitive text commands.
+By leveraging a specialized multi-modal architecture and our multi-modal single-cell instruction dataset, InstructCell reduces technical barriers and enhances accessibility for single-cell analysis.
+**Instruct Version**: Focused solely on generating concise answers without extra text.
+### 🚀 How to use
+We provide a simple example for quick reference. This demonstrates a basic **cell type annotation** workflow.
+Make sure to specify the paths for `H5AD_PATH` and `GENE_VOCAB_PATH` appropriately:
+- `H5AD_PATH`: Path to your `.h5ad` single-cell data file (e.g., `H5AD_PATH = "path/to/your/data.h5ad"`).
+- `GENE_VOCAB_PATH`: Path to your gene vocabulary file (e.g., `GENE_VOCAB_PATH = "path/to/your/gene_vocab.npy"`).
+```python
+from mmllm.module import InstructCell
+import anndata
+import numpy as np
+from utils import unify_gene_features
+# Load the pre-trained InstructCell model from HuggingFace
+model = InstructCell.from_pretrained("zjunlp/InstructCell-instruct")
+# Load the single-cell data (H5AD format) and gene vocabulary file (numpy format)
+adata = anndata.read_h5ad(H5AD_PATH)
+gene_vocab = np.load(GENE_VOCAB_PATH)
+adata = unify_gene_features(adata, gene_vocab, force_gene_symbol_uppercase=False)
+# Select a random single-cell sample and extract its gene counts and metadata
+k = np.random.randint(0, len(adata))
+gene_counts = adata[k, :].X.toarray()
+sc_metadata = adata[k, :].obs.iloc[0].to_dict()
+# Define the model prompt with placeholders for metadata and gene expression profile
+prompt = (
+    "Can you help me annotate this single cell from a {species}? "
+    "It was sequenced using {sequencing_method} and is derived from {tissue}. "
+    "The gene expression profile is {input}. Thanks!"
+)
+# Use the model to generate predictions
+for key, value in model.predict(
+    prompt,
+    gene_counts=gene_counts,
+    sc_metadata=sc_metadata,
+    do_sample=True,
+    top_p=0.95,
+    top_k=50,
+    max_new_tokens=256,
+).items():
+    # Print each key-value pair
+    print(f"{key}: {value}")
+```
+For more detailed explanations and additional examples, please refer to the Jupyter notebook [demo.ipynb](https://github.com/zjunlp/InstructCell/blob/main/demo.ipynb).