TITAN-BBB
The paper is under review.
[Github Repo] | [Dataset on HuggingFace] | [Cite]
Abstract
The blood-brain barrier (BBB) restricts most compounds from entering the brain, making BBB permeability prediction crucial for drug discovery. Experimental assays are costly and limited, motivating computational approaches. While machine learning has shown promise, combining chemical descriptors with deep learning embeddings remains underexplored. Here, we introduce TITAN-BBB, a multi-modal architecture that combines tabular, image, and text-based features via attention mechanism. To evaluate, we aggregated multiple literature sources to create the largest BBB permeability dataset to date, enabling robust training for both classification and regression tasks. Our results demonstrate that TITAN-BBB achieves 86.5% of balanced accuracy on classification tasks and 0.436 of mean absolute error for regression. Our approach also outperforms state-of-the-art models in both classification and regression performance, demonstrating the benefits of combining deep and domain-specific representations.
Model Details
TITAN-BBB is a multi-modal method designed for molecular property (BBB) prediction. This architecture effectively combines three sources of information: embeddings from a pre-trained language model (ChemBERTa-100M-MLM), images representation extracted from convolutional neural networks (ResNet50), and classical molecular descriptors (RDKit).
TITAN-BBB consists of three stages: multi-modal feature projection, attention-based fusion, and prediction.
Model Usage
Note: The model is only available using AutoModelForSequenceClassification.
Note: This model uses a custom architecture (Transformer + CNN + RDKit) defined in the source repository. Therefore, you must set trust_remote_code=True when loading both the model and the tokenizer.
Classification
Use the code below to score (between 0 and 1) if a molecule can cross the BBB.
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained('SaeedLab/TITAN-BBB', subfolder='classification', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('SaeedLab/TITAN-BBB', subfolder='classification', trust_remote_code=True)
model.eval()
smiles = ["NCCc1nc(-c2ccccc2)cs1", "CC(=O)OCC(C)C"]
inputs = tokenizer(smiles, task='classification')
with torch.no_grad():
outputs = model(**inputs)
print(torch.sigmoid(outputs.logits))
Regression
Use the code below to predict a molecule's permeability value (blood-brain barrier permeability).
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained('SaeedLab/TITAN-BBB', subfolder='regression', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('SaeedLab/TITAN-BBB', subfolder='regression', trust_remote_code=True)
model.eval()
smiles = ["NCCc1nc(-c2ccccc2)cs1", "CC(=O)OCC(C)C"]
inputs = tokenizer(smiles, task='regression')
with torch.no_grad():
outputs = model(**inputs)
print(outputs.logits)
Model Output
Both classification and regression models return for each input:
- logits: the raw output scores. For classification, please apply sigmoid to get the score between 0 and 1. For regression, use it as prediction.
- hidden_states: the attention-based aggregation of tabular, image, and text representations.
- attentions: the attention weights considering tabular, image, and text features for each input.
Requirements
huggingface_hub
rdkit
torch
torchvision
Citation
The paper is under review. As soon as it is accepted, we will update this section.
License
This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of this model and its derivatives, which include models trained on outputs from the model or datasets created from the model, is prohibited and requires prior approval. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author.
Contact
For any additional questions or comments, contact Fahad Saeed ([email protected]).
