SentenceTransformer based on distilbert/distilbert-base-uncased

This is a sentence-transformers model finetuned from distilbert/distilbert-base-uncased on the askubuntu-questions dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 75, 'do_lower_case': False, 'architecture': 'DistilBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/distilbert-base-uncased-askubuntu-ct")
# Run inference
sentences = [
    'installing by using wubi on windows vista 64',
    'some problems with the keyboard layout ?',
    'how do i make nautilus windows stick for drag & drop ?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.1882, 0.2154],
#         [0.1882, 1.0000, 0.1747],
#         [0.2154, 0.1747, 1.0000]])

Evaluation

Metrics

Reranking

  • Datasets: askubuntu-dev and askubuntu-test
  • Evaluated with RerankingEvaluator with these parameters:
    {
        "at_k": 10
    }
    
Metric askubuntu-dev askubuntu-test
map 0.5172 0.5511
mrr@10 0.6605 0.6833
ndcg@10 0.5574 0.5978

Training Details

Training Dataset

askubuntu-questions

  • Dataset: askubuntu-questions at 0c9999e
  • Size: 160,425 training samples
  • Columns: text1 and text2
  • Approximate statistics based on the first 1000 samples:
    text1 text2
    type string string
    details
    • min: 5 tokens
    • mean: 14.43 tokens
    • max: 39 tokens
    • min: 4 tokens
    • mean: 14.97 tokens
    • max: 42 tokens
  • Samples:
    text1 text2
    how to get the `` your battery is broken '' message to go away ? how to get the `` your battery is broken '' message to go away ?
    how can i set the software center to install software for non-root users ? limiting file access for a huge number of users
    what are some alternatives to upgrading without using the standard upgrade system ? how can change background of nautilus
  • Loss: ContrastiveTensionLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • learning_rate: 2e-06
  • num_train_epochs: 1
  • optim: rmsprop

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 8
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-06
  • weight_decay: 0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: rmsprop
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss askubuntu-dev_ndcg@10 askubuntu-test_ndcg@10
-1 -1 - 0.5147 0.5170
0.0100 100 88.9905 - -
0.0199 200 5.4677 - -
0.0299 300 2.6312 - -
0.0399 400 1.3858 - -
0.0499 500 0.9374 - -
0.0598 600 0.5592 - -
0.0698 700 0.7174 - -
0.0798 800 0.5705 - -
0.0898 900 0.4788 - -
0.0997 1000 0.3194 0.5407 -
0.1097 1100 0.2518 - -
0.1197 1200 0.2657 - -
0.1296 1300 0.2614 - -
0.1396 1400 0.2060 - -
0.1496 1500 0.1802 - -
0.1596 1600 0.2680 - -
0.1695 1700 0.2539 - -
0.1795 1800 0.2850 - -
0.1895 1900 0.2270 - -
0.1995 2000 0.2129 0.5506 -
0.2094 2100 0.1698 - -
0.2194 2200 0.2380 - -
0.2294 2300 0.1907 - -
0.2394 2400 0.3914 - -
0.2493 2500 0.1575 - -
0.2593 2600 0.1907 - -
0.2693 2700 0.1080 - -
0.2792 2800 0.1505 - -
0.2892 2900 0.1195 - -
0.2992 3000 0.0943 0.5573 -
0.3092 3100 0.1538 - -
0.3191 3200 0.1044 - -
0.3291 3300 0.2145 - -
0.3391 3400 0.2781 - -
0.3491 3500 0.1988 - -
0.3590 3600 0.2708 - -
0.3690 3700 0.1731 - -
0.3790 3800 0.2764 - -
0.3889 3900 0.1160 - -
0.3989 4000 0.2061 0.5542 -
0.4089 4100 0.1619 - -
0.4189 4200 0.1711 - -
0.4288 4300 0.1330 - -
0.4388 4400 0.1505 - -
0.4488 4500 0.1210 - -
0.4588 4600 0.1164 - -
0.4687 4700 0.1653 - -
0.4787 4800 0.1489 - -
0.4887 4900 0.0486 - -
0.4987 5000 0.1202 0.5589 -
0.5086 5100 0.1503 - -
0.5186 5200 0.0976 - -
0.5286 5300 0.0675 - -
0.5385 5400 0.0918 - -
0.5485 5500 0.2239 - -
0.5585 5600 0.1034 - -
0.5685 5700 0.1660 - -
0.5784 5800 0.1669 - -
0.5884 5900 0.0716 - -
0.5984 6000 0.3106 0.5616 -
0.6084 6100 0.1240 - -
0.6183 6200 0.1670 - -
0.6283 6300 0.2198 - -
0.6383 6400 0.1169 - -
0.6482 6500 0.1376 - -
0.6582 6600 0.2339 - -
0.6682 6700 0.1729 - -
0.6782 6800 0.0491 - -
0.6881 6900 0.1400 - -
0.6981 7000 0.0688 0.5660 -
0.7081 7100 0.2194 - -
0.7181 7200 0.1351 - -
0.7280 7300 0.0832 - -
0.7380 7400 0.1015 - -
0.7480 7500 0.0390 - -
0.7580 7600 0.2088 - -
0.7679 7700 0.0888 - -
0.7779 7800 0.2217 - -
0.7879 7900 0.1913 - -
0.7978 8000 0.0557 0.5582 -
0.8078 8100 0.0986 - -
0.8178 8200 0.1408 - -
0.8278 8300 0.0744 - -
0.8377 8400 0.1375 - -
0.8477 8500 0.0746 - -
0.8577 8600 0.0734 - -
0.8677 8700 0.0827 - -
0.8776 8800 0.1275 - -
0.8876 8900 0.1072 - -
0.8976 9000 0.1975 0.5577 -
0.9075 9100 0.0408 - -
0.9175 9200 0.0584 - -
0.9275 9300 0.2589 - -
0.9375 9400 0.0503 - -
0.9474 9500 0.1529 - -
0.9574 9600 0.0840 - -
0.9674 9700 0.2059 - -
0.9774 9800 0.0634 - -
0.9873 9900 0.0837 - -
0.9973 10000 0.1010 0.5574 -
-1 -1 - 0.5574 0.5978

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 5.3.0.dev0
  • Transformers: 5.0.1.dev0
  • PyTorch: 2.10.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveTensionLoss

@inproceedings{carlsson2021semantic,
    title={Semantic Re-tuning with Contrastive Tension},
    author={Fredrik Carlsson and Amaru Cuba Gyllensten and Evangelia Gogoulou and Erik Ylip{"a}{"a} Hellqvist and Magnus Sahlgren},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=Ov_sMNau-PF}
}
Downloads last month
-
Safetensors
Model size
66.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomaarsen/distilbert-base-uncased-askubuntu-ct

Finetuned
(10987)
this model

Dataset used to train tomaarsen/distilbert-base-uncased-askubuntu-ct

Paper for tomaarsen/distilbert-base-uncased-askubuntu-ct

Evaluation results