| # ESM2 Protein Model | |
| This is the protein component of a jointly trained NT-ESM2 model pair for DNA-protein analysis. | |
| ## Model Details | |
| - **Model Type**: ESM2 for protein sequences | |
| - **Training**: Jointly trained with NT DNA model | |
| - **Architecture**: Transformer-based language model for proteins | |
| ## Usage | |
| ```python | |
| from transformers import AutoModel, AutoTokenizer | |
| # Load model and tokenizer | |
| model = AutoModel.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding-protein") | |
| tokenizer = AutoTokenizer.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding-protein") | |
| # Example usage | |
| protein_sequence = "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG" | |
| inputs = tokenizer(protein_sequence, return_tensors="pt") | |
| outputs = model(**inputs) | |
| ``` | |
| ## Training Details | |
| - Jointly trained with DNA sequences for cross-modal understanding | |
| - Large model variant | |
| - Transcript-specific protein coding sequences | |
| ## Files | |
| - `config.json`: Model configuration | |
| - `model.safetensors`: Model weights | |
| - `tokenizer_config.json`: Tokenizer configuration | |
| - `vocab.txt`: Vocabulary file | |
| - `special_tokens_map.json`: Special tokens mapping | |
| ## Citation | |
| If you use this model, please cite the original ESM2 paper and your joint training work. | |