vsubasri
/

joint-nt-esm2-transcript-coding-protein

Model card Files Files and versions

joint-nt-esm2-transcript-coding-protein / README.md

vsubasri's picture

Upload ESM2 protein model

502d6af verified 5 months ago

|

history blame contribute delete

1.25 kB

	# ESM2 Protein Model

	This is the protein component of a jointly trained NT-ESM2 model pair for DNA-protein analysis.

	## Model Details

	- Model Type: ESM2 for protein sequences
	- Training: Jointly trained with NT DNA model
	- Architecture: Transformer-based language model for proteins

	## Usage

	```python
	from transformers import AutoModel, AutoTokenizer

	# Load model and tokenizer
	model = AutoModel.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding-protein")
	tokenizer = AutoTokenizer.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding-protein")

	# Example usage
	protein_sequence = "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
	inputs = tokenizer(protein_sequence, return_tensors="pt")
	outputs = model(**inputs)
	```

	## Training Details

	- Jointly trained with DNA sequences for cross-modal understanding
	- Large model variant
	- Transcript-specific protein coding sequences

	## Files

	- `config.json`: Model configuration
	- `model.safetensors`: Model weights
	- `tokenizer_config.json`: Tokenizer configuration
	- `vocab.txt`: Vocabulary file
	- `special_tokens_map.json`: Special tokens mapping

	## Citation

	If you use this model, please cite the original ESM2 paper and your joint training work.