SentenceTransformer based on distilbert/distilbert-base-uncased

This is a sentence-transformers model finetuned from distilbert/distilbert-base-uncased on the askubuntu-questions dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: distilbert/distilbert-base-uncased
Maximum Sequence Length: 75 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- askubuntu-questions
Language: en

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 75, 'do_lower_case': False, 'architecture': 'DistilBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/distilbert-base-uncased-askubuntu-ct")
# Run inference
sentences = [
    'installing by using wubi on windows vista 64',
    'some problems with the keyboard layout ?',
    'how do i make nautilus windows stick for drag & drop ?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.1882, 0.2154],
#         [0.1882, 1.0000, 0.1747],
#         [0.2154, 0.1747, 1.0000]])

Evaluation

Metrics

Reranking

Datasets: askubuntu-dev and askubuntu-test
Evaluated with RerankingEvaluator with these parameters:
```
{
    "at_k": 10
}
```

Metric	askubuntu-dev	askubuntu-test
map	0.5172	0.5511
mrr@10	0.6605	0.6833
ndcg@10	0.5574	0.5978

Training Details

Training Dataset

askubuntu-questions

Dataset: askubuntu-questions at 0c9999e
Size: 160,425 training samples
Columns: text1 and text2
Approximate statistics based on the first 1000 samples:
text1 text2
type string string
details
min: 5 tokens
mean: 14.43 tokens
max: 39 tokens

min: 4 tokens
mean: 14.97 tokens
max: 42 tokens

	text1	text2
type	string	string
details	min: 5 tokens mean: 14.43 tokens max: 39 tokens	min: 4 tokens mean: 14.97 tokens max: 42 tokens

Samples:

text1	text2
how to get the `` your battery is broken '' message to go away ?	how to get the `` your battery is broken '' message to go away ?
`how can i set the software center to install software for non-root users ?`	`limiting file access for a huge number of users`
`what are some alternatives to upgrading without using the standard upgrade system ?`	`how can change background of nautilus`

Loss: ContrastiveTensionLoss

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
learning_rate: 2e-06
num_train_epochs: 1
optim: rmsprop

All Hyperparameters

Click to expand

do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 8
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-06
weight_decay: 0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: None
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
enable_jit_checkpoint: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
use_cpu: False
seed: 42
data_seed: None
bf16: False
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: -1
ddp_backend: None
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: rmsprop
optim_args: None
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_for_metrics: []
eval_do_concat_batches: True
auto_find_batch_size: False
full_determinism: False
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
use_cache: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss	askubuntu-dev_ndcg@10	askubuntu-test_ndcg@10
-1	-1	-	0.5147	0.5170
0.0100	100	88.9905	-	-
0.0199	200	5.4677	-	-
0.0299	300	2.6312	-	-
0.0399	400	1.3858	-	-
0.0499	500	0.9374	-	-
0.0598	600	0.5592	-	-
0.0698	700	0.7174	-	-
0.0798	800	0.5705	-	-
0.0898	900	0.4788	-	-
0.0997	1000	0.3194	0.5407	-
0.1097	1100	0.2518	-	-
0.1197	1200	0.2657	-	-
0.1296	1300	0.2614	-	-
0.1396	1400	0.2060	-	-
0.1496	1500	0.1802	-	-
0.1596	1600	0.2680	-	-
0.1695	1700	0.2539	-	-
0.1795	1800	0.2850	-	-
0.1895	1900	0.2270	-	-
0.1995	2000	0.2129	0.5506	-
0.2094	2100	0.1698	-	-
0.2194	2200	0.2380	-	-
0.2294	2300	0.1907	-	-
0.2394	2400	0.3914	-	-
0.2493	2500	0.1575	-	-
0.2593	2600	0.1907	-	-
0.2693	2700	0.1080	-	-
0.2792	2800	0.1505	-	-
0.2892	2900	0.1195	-	-
0.2992	3000	0.0943	0.5573	-
0.3092	3100	0.1538	-	-
0.3191	3200	0.1044	-	-
0.3291	3300	0.2145	-	-
0.3391	3400	0.2781	-	-
0.3491	3500	0.1988	-	-
0.3590	3600	0.2708	-	-
0.3690	3700	0.1731	-	-
0.3790	3800	0.2764	-	-
0.3889	3900	0.1160	-	-
0.3989	4000	0.2061	0.5542	-
0.4089	4100	0.1619	-	-
0.4189	4200	0.1711	-	-
0.4288	4300	0.1330	-	-
0.4388	4400	0.1505	-	-
0.4488	4500	0.1210	-	-
0.4588	4600	0.1164	-	-
0.4687	4700	0.1653	-	-
0.4787	4800	0.1489	-	-
0.4887	4900	0.0486	-	-
0.4987	5000	0.1202	0.5589	-
0.5086	5100	0.1503	-	-
0.5186	5200	0.0976	-	-
0.5286	5300	0.0675	-	-
0.5385	5400	0.0918	-	-
0.5485	5500	0.2239	-	-
0.5585	5600	0.1034	-	-
0.5685	5700	0.1660	-	-
0.5784	5800	0.1669	-	-
0.5884	5900	0.0716	-	-
0.5984	6000	0.3106	0.5616	-
0.6084	6100	0.1240	-	-
0.6183	6200	0.1670	-	-
0.6283	6300	0.2198	-	-
0.6383	6400	0.1169	-	-
0.6482	6500	0.1376	-	-
0.6582	6600	0.2339	-	-
0.6682	6700	0.1729	-	-
0.6782	6800	0.0491	-	-
0.6881	6900	0.1400	-	-
0.6981	7000	0.0688	0.5660	-
0.7081	7100	0.2194	-	-
0.7181	7200	0.1351	-	-
0.7280	7300	0.0832	-	-
0.7380	7400	0.1015	-	-
0.7480	7500	0.0390	-	-
0.7580	7600	0.2088	-	-
0.7679	7700	0.0888	-	-
0.7779	7800	0.2217	-	-
0.7879	7900	0.1913	-	-
0.7978	8000	0.0557	0.5582	-
0.8078	8100	0.0986	-	-
0.8178	8200	0.1408	-	-
0.8278	8300	0.0744	-	-
0.8377	8400	0.1375	-	-
0.8477	8500	0.0746	-	-
0.8577	8600	0.0734	-	-
0.8677	8700	0.0827	-	-
0.8776	8800	0.1275	-	-
0.8876	8900	0.1072	-	-
0.8976	9000	0.1975	0.5577	-
0.9075	9100	0.0408	-	-
0.9175	9200	0.0584	-	-
0.9275	9300	0.2589	-	-
0.9375	9400	0.0503	-	-
0.9474	9500	0.1529	-	-
0.9574	9600	0.0840	-	-
0.9674	9700	0.2059	-	-
0.9774	9800	0.0634	-	-
0.9873	9900	0.0837	-	-
0.9973	10000	0.1010	0.5574	-
-1	-1	-	0.5574	0.5978

Framework Versions

Python: 3.11.6
Sentence Transformers: 5.3.0.dev0
Transformers: 5.0.1.dev0
PyTorch: 2.10.0+cu126
Accelerate: 1.12.0
Datasets: 4.3.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveTensionLoss

@inproceedings{carlsson2021semantic,
    title={Semantic Re-tuning with Contrastive Tension},
    author={Fredrik Carlsson and Amaru Cuba Gyllensten and Evangelia Gogoulou and Erik Ylip{"a}{"a} Hellqvist and Magnus Sahlgren},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=Ov_sMNau-PF}
}

Downloads last month: -

Safetensors

Model size

66.4M params

Tensor type

F32

Model tree for tomaarsen/distilbert-base-uncased-askubuntu-ct

Base model

distilbert/distilbert-base-uncased

Finetuned

(10987)

this model

Dataset used to train tomaarsen/distilbert-base-uncased-askubuntu-ct

Paper for tomaarsen/distilbert-base-uncased-askubuntu-ct

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 12

Evaluation results

Map on askubuntu dev
self-reported

0.517
Mrr@10 on askubuntu dev
self-reported

0.661
Ndcg@10 on askubuntu dev
self-reported

0.557
Map on askubuntu test
self-reported

0.551
Mrr@10 on askubuntu test
self-reported

0.683
Ndcg@10 on askubuntu test
self-reported

0.598