TITAN-BBB

The paper is under review.

[Github Repo] | [Dataset on HuggingFace] | [Cite]

Abstract

The blood-brain barrier (BBB) restricts most compounds from entering the brain, making BBB permeability prediction crucial for drug discovery. Experimental assays are costly and limited, motivating computational approaches. While machine learning has shown promise, combining chemical descriptors with deep learning embeddings remains underexplored. Here, we introduce TITAN-BBB, a multi-modal architecture that combines tabular, image, and text-based features via attention mechanism. To evaluate, we aggregated multiple literature sources to create the largest BBB permeability dataset to date, enabling robust training for both classification and regression tasks. Our results demonstrate that TITAN-BBB achieves 86.5% of balanced accuracy on classification tasks and 0.436 of mean absolute error for regression. Our approach also outperforms state-of-the-art models in both classification and regression performance, demonstrating the benefits of combining deep and domain-specific representations.

Model Details

TITAN-BBB is a multi-modal method designed for molecular property (BBB) prediction. This architecture effectively combines three sources of information: embeddings from a pre-trained language model (ChemBERTa-100M-MLM), images representation extracted from convolutional neural networks (ResNet50), and classical molecular descriptors (RDKit).

TITAN-BBB consists of three stages: multi-modal feature projection, attention-based fusion, and prediction.

Model

Model Usage

Note: The model is only available using AutoModelForSequenceClassification.

Note: This model uses a custom architecture (Transformer + CNN + RDKit) defined in the source repository. Therefore, you must set trust_remote_code=True when loading both the model and the tokenizer.

Classification

Use the code below to score (between 0 and 1) if a molecule can cross the BBB.

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained('SaeedLab/TITAN-BBB', subfolder='classification', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('SaeedLab/TITAN-BBB', subfolder='classification', trust_remote_code=True)

model.eval()

smiles = ["NCCc1nc(-c2ccccc2)cs1", "CC(=O)OCC(C)C"]
inputs = tokenizer(smiles, task='classification')

with torch.no_grad():
  outputs = model(**inputs)

print(torch.sigmoid(outputs.logits))

Regression

Use the code below to predict a molecule's permeability value (blood-brain barrier permeability).

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained('SaeedLab/TITAN-BBB', subfolder='regression', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('SaeedLab/TITAN-BBB', subfolder='regression', trust_remote_code=True)

model.eval()

smiles = ["NCCc1nc(-c2ccccc2)cs1", "CC(=O)OCC(C)C"]
inputs = tokenizer(smiles, task='regression')

with torch.no_grad():
  outputs = model(**inputs)

print(outputs.logits)

Model Output

Both classification and regression models return for each input:

  • logits: the raw output scores. For classification, please apply sigmoid to get the score between 0 and 1. For regression, use it as prediction.
  • hidden_states: the attention-based aggregation of tabular, image, and text representations.
  • attentions: the attention weights considering tabular, image, and text features for each input.

Requirements

huggingface_hub
rdkit
torch
torchvision

Citation

The paper is under review. As soon as it is accepted, we will update this section.

License

This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of this model and its derivatives, which include models trained on outputs from the model or datasets created from the model, is prohibited and requires prior approval. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author.

Contact

For any additional questions or comments, contact Fahad Saeed ([email protected]).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train SaeedLab/TITAN-BBB