metadata
language: []
library_name: sentence-transformers
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:10000
- loss:SoftmaxLoss
base_model: google-bert/bert-base-uncased
datasets: []
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
widget:
- source_sentence: >-
A man selling donuts to a customer during a world exhibition event held in
the city of Angeles
sentences:
- The man is doing tricks.
- A woman drinks her coffee in a small cafe.
- The building is made of logs.
- source_sentence: A group of people prepare hot air balloons for takeoff.
sentences:
- There are hot air balloons on the ground and air.
- A man is in an art museum.
- People watch another person do a trick.
- source_sentence: Three workers are trimming down trees.
sentences:
- The goalie is sleeping at home.
- There are three workers
- The girl has brown hair.
- source_sentence: >-
Two brown-haired men wearing short-sleeved shirts and shorts are climbing
stairs.
sentences:
- The men have blonde hair.
- A bicyclist passes an esthetically beautiful building on a sunny day
- Two men are dancing.
- source_sentence: A man is sitting in on the side of the street with brass pots.
sentences:
- a younger boy looks at his father
- Children are at the beach.
- a man does not have brass pots
pipeline_tag: sentence-similarity
co2_eq_emissions:
emissions: 147.28843774992524
energy_consumed: 0.2758298255748315
source: codecarbon
training_type: fine-tuning
on_cloud: false
cpu_model: AMD EPYC 7H12 64-Core Processor
ram_total_size: 229.14864349365234
hours_used: 0.351
hardware_used: 8 x NVIDIA GeForce RTX 3090
model-index:
- name: SentenceTransformer based on google-bert/bert-base-uncased
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev
type: sts-dev
metrics:
- type: pearson_cosine
value: 0.47725003430658275
name: Pearson Cosine
- type: spearman_cosine
value: 0.5475746919034576
name: Spearman Cosine
- type: pearson_manhattan
value: 0.5043805022296893
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.5420702830995872
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.5083739540394052
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.544209699690841
name: Spearman Euclidean
- type: pearson_dot
value: 0.4458579859528435
name: Pearson Dot
- type: spearman_dot
value: 0.4698642508787034
name: Spearman Dot
- type: pearson_max
value: 0.5083739540394052
name: Pearson Max
- type: spearman_max
value: 0.5475746919034576
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test
type: sts-test
metrics:
- type: pearson_cosine
value: 0.5320947494943107
name: Pearson Cosine
- type: spearman_cosine
value: 0.5317279446221387
name: Spearman Cosine
- type: pearson_manhattan
value: 0.5575308236485216
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.5554390408837996
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.55587770863865
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.5535804159700501
name: Spearman Euclidean
- type: pearson_dot
value: 0.2787697886285483
name: Pearson Dot
- type: spearman_dot
value: 0.2710358104528421
name: Spearman Dot
- type: pearson_max
value: 0.5575308236485216
name: Pearson Max
- type: spearman_max
value: 0.5554390408837996
name: Spearman Max
- type: pearson_cosine
value: 0.4493844540252116
name: Pearson Cosine
- type: spearman_cosine
value: 0.4694611677633312
name: Spearman Cosine
- type: pearson_manhattan
value: 0.4773641092320219
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.4763054309792941
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.4796801942910325
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.47774521406648734
name: Spearman Euclidean
- type: pearson_dot
value: 0.4081600817978359
name: Pearson Dot
- type: spearman_dot
value: 0.3898881150281674
name: Spearman Dot
- type: pearson_max
value: 0.4796801942910325
name: Pearson Max
- type: spearman_max
value: 0.47774521406648734
name: Spearman Max
SentenceTransformer based on google-bert/bert-base-uncased
This is a sentence-transformers model finetuned from google-bert/bert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: google-bert/bert-base-uncased
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jilangdi/bert-base-uncased-nli-v1")
# Run inference
sentences = [
'A man is sitting in on the side of the street with brass pots.',
'a man does not have brass pots',
'Children are at the beach.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
sts-dev - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.4773 |
| spearman_cosine | 0.5476 |
| pearson_manhattan | 0.5044 |
| spearman_manhattan | 0.5421 |
| pearson_euclidean | 0.5084 |
| spearman_euclidean | 0.5442 |
| pearson_dot | 0.4459 |
| spearman_dot | 0.4699 |
| pearson_max | 0.5084 |
| spearman_max | 0.5476 |
Semantic Similarity
- Dataset:
sts-test - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.5321 |
| spearman_cosine | 0.5317 |
| pearson_manhattan | 0.5575 |
| spearman_manhattan | 0.5554 |
| pearson_euclidean | 0.5559 |
| spearman_euclidean | 0.5536 |
| pearson_dot | 0.2788 |
| spearman_dot | 0.271 |
| pearson_max | 0.5575 |
| spearman_max | 0.5554 |
Semantic Similarity
- Dataset:
sts-test - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.4494 |
| spearman_cosine | 0.4695 |
| pearson_manhattan | 0.4774 |
| spearman_manhattan | 0.4763 |
| pearson_euclidean | 0.4797 |
| spearman_euclidean | 0.4777 |
| pearson_dot | 0.4082 |
| spearman_dot | 0.3899 |
| pearson_max | 0.4797 |
| spearman_max | 0.4777 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 10,000 training samples
- Columns:
premise,hypothesis, andlabel - Approximate statistics based on the first 1000 samples:
premise hypothesis label type string string int details - min: 6 tokens
- mean: 17.38 tokens
- max: 52 tokens
- min: 4 tokens
- mean: 10.7 tokens
- max: 31 tokens
- 0: ~33.40%
- 1: ~33.30%
- 2: ~33.30%
- Samples:
premise hypothesis label A person on a horse jumps over a broken down airplane.A person is training his horse for a competition.1A person on a horse jumps over a broken down airplane.A person is at a diner, ordering an omelette.2A person on a horse jumps over a broken down airplane.A person is outdoors, on a horse.0 - Loss:
SoftmaxLoss
Evaluation Dataset
Unnamed Dataset
- Size: 1,000 evaluation samples
- Columns:
premise,hypothesis, andlabel - Approximate statistics based on the first 1000 samples:
premise hypothesis label type string string int details - min: 6 tokens
- mean: 18.44 tokens
- max: 57 tokens
- min: 5 tokens
- mean: 10.57 tokens
- max: 25 tokens
- 0: ~33.10%
- 1: ~33.30%
- 2: ~33.60%
- Samples:
premise hypothesis label Two women are embracing while holding to go packages.The sisters are hugging goodbye while holding to go packages after just eating lunch.1Two women are embracing while holding to go packages.Two woman are holding packages.0Two women are embracing while holding to go packages.The men are fighting outside a deli.2 - Loss:
SoftmaxLoss
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 5warmup_ratio: 0.1fp16: True
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falsebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | loss | sts-dev_spearman_cosine | sts-test_spearman_cosine |
|---|---|---|---|---|---|
| 0 | 0 | - | - | 0.5931 | - |
| 1.0 | 79 | - | - | - | 0.5317 |
| 1.2658 | 100 | 0.545 | 0.9351 | 0.5973 | - |
| 2.5316 | 200 | 0.5286 | 0.9535 | 0.5660 | - |
| 3.7975 | 300 | 0.3553 | 1.0364 | 0.5476 | - |
| 5.0 | 395 | - | - | - | 0.4695 |
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.276 kWh
- Carbon Emitted: 0.147 kg of CO2
- Hours Used: 0.351 hours
Training Hardware
- On Cloud: No
- GPU Model: 8 x NVIDIA GeForce RTX 3090
- CPU Model: AMD EPYC 7H12 64-Core Processor
- RAM Size: 229.15 GB
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.0.1
- Transformers: 4.41.2
- PyTorch: 2.3.1+cu121
- Accelerate: 0.31.0
- Datasets: 2.19.2
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers and SoftmaxLoss
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}