ModernBERT-base trained on GooAQ
This is a Cross Encoder model finetuned from answerdotai/ModernBERT-base using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: answerdotai/ModernBERT-base
- Maximum Sequence Length: 8192 tokens
- Number of Output Labels: 1 label
- Language: en
- License: apache-2.0
Model Sources
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
model = CrossEncoder("akr2002/reranker-ModernBERT-base-gooaq-bce")
pairs = [
['how do you find mass?', "Divide the object's weight by the acceleration of gravity to find the mass. You'll need to convert the weight units to Newtons. For example, 1 kg = 9.807 N. If you're measuring the mass of an object on Earth, divide the weight in Newtons by the acceleration of gravity on Earth (9.8 meters/second2) to get mass."],
['how do you find mass?', "In general use, 'High Mass' means a full ceremonial Mass, most likely with music, and also with incense if they're particularly traditional. ... Incense is used quite a lot. Low Mass in the traditional rite is celebrated by one priest, and usually only one or two altar servers."],
['how do you find mass?', 'A neutron has a slightly larger mass than the proton. These are often given in terms of an atomic mass unit, where one atomic mass unit (u) is defined as 1/12th the mass of a carbon-12 atom. You can use that to prove that a mass of 1 u is equivalent to an energy of 931.5 MeV.'],
['how do you find mass?', 'Mass is the amount of matter in a body, normally measured in grams or kilograms etc. Weight is a force that pulls on a mass and is measured in Newtons. ... Density basically means how much mass is occupied in a specific volume or space. Different materials of the same size may have different masses because of its density.'],
['how do you find mass?', 'Receiver – Mass communication is the transmission of the message to a large number of recipients. This mass of receivers, are often called as mass audience. The Mass audience is large, heterogenous and anonymous in nature. The receivers are scattered across a given village, state or country.'],
]
scores = model.predict(pairs)
print(scores.shape)
ranks = model.rank(
'how do you find mass?',
[
"Divide the object's weight by the acceleration of gravity to find the mass. You'll need to convert the weight units to Newtons. For example, 1 kg = 9.807 N. If you're measuring the mass of an object on Earth, divide the weight in Newtons by the acceleration of gravity on Earth (9.8 meters/second2) to get mass.",
"In general use, 'High Mass' means a full ceremonial Mass, most likely with music, and also with incense if they're particularly traditional. ... Incense is used quite a lot. Low Mass in the traditional rite is celebrated by one priest, and usually only one or two altar servers.",
'A neutron has a slightly larger mass than the proton. These are often given in terms of an atomic mass unit, where one atomic mass unit (u) is defined as 1/12th the mass of a carbon-12 atom. You can use that to prove that a mass of 1 u is equivalent to an energy of 931.5 MeV.',
'Mass is the amount of matter in a body, normally measured in grams or kilograms etc. Weight is a force that pulls on a mass and is measured in Newtons. ... Density basically means how much mass is occupied in a specific volume or space. Different materials of the same size may have different masses because of its density.',
'Receiver – Mass communication is the transmission of the message to a large number of recipients. This mass of receivers, are often called as mass audience. The Mass audience is large, heterogenous and anonymous in nature. The receivers are scattered across a given village, state or country.',
]
)
Evaluation
Metrics
Cross Encoder Reranking
| Metric |
Value |
| map |
0.7258 (+0.1946) |
| mrr@10 |
0.7245 (+0.2005) |
| ndcg@10 |
0.7686 (+0.1774) |
Cross Encoder Reranking
| Metric |
NanoMSMARCO_R100 |
NanoNFCorpus_R100 |
NanoNQ_R100 |
| map |
0.4807 (-0.0089) |
0.3866 (+0.1256) |
0.5595 (+0.1399) |
| mrr@10 |
0.4689 (-0.0086) |
0.6058 (+0.1060) |
0.5752 (+0.1485) |
| ndcg@10 |
0.5499 (+0.0095) |
0.4233 (+0.0982) |
0.6191 (+0.1184) |
Cross Encoder Nano BEIR
- Dataset:
NanoBEIR_R100_mean
- Evaluated with
CrossEncoderNanoBEIREvaluator with these parameters:{
"dataset_names": [
"msmarco",
"nfcorpus",
"nq"
],
"rerank_k": 100,
"at_k": 10,
"always_rerank_positives": true
}
| Metric |
Value |
| map |
0.4756 (+0.0855) |
| mrr@10 |
0.5500 (+0.0820) |
| ndcg@10 |
0.5308 (+0.0754) |
Training Details
Training Dataset
Unnamed Dataset
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
seed: 12
bf16: True
dataloader_num_workers: 4
load_best_model_at_end: True
All Hyperparameters
Click to expand
overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 12
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 4
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
Training Logs
| Epoch |
Step |
Training Loss |
gooaq-dev_ndcg@10 |
NanoMSMARCO_R100_ndcg@10 |
NanoNFCorpus_R100_ndcg@10 |
NanoNQ_R100_ndcg@10 |
NanoBEIR_R100_mean_ndcg@10 |
| -1 |
-1 |
- |
0.1474 (-0.4438) |
0.0356 (-0.5048) |
0.2344 (-0.0907) |
0.0268 (-0.4739) |
0.0989 (-0.3564) |
| 0.0000 |
1 |
1.1353 |
- |
- |
- |
- |
- |
| 0.0277 |
1000 |
1.1797 |
- |
- |
- |
- |
- |
| 0.0553 |
2000 |
0.8539 |
- |
- |
- |
- |
- |
| 0.0830 |
3000 |
0.7438 |
- |
- |
- |
- |
- |
| 0.1106 |
4000 |
0.7296 |
0.7119 (+0.1206) |
0.5700 (+0.0296) |
0.3410 (+0.0160) |
0.6012 (+0.1005) |
0.5041 (+0.0487) |
| 0.1383 |
5000 |
0.6705 |
- |
- |
- |
- |
- |
| 0.1660 |
6000 |
0.6624 |
- |
- |
- |
- |
- |
| 0.1936 |
7000 |
0.6685 |
- |
- |
- |
- |
- |
| 0.2213 |
8000 |
0.6305 |
0.7328 (+0.1415) |
0.5504 (+0.0099) |
0.4056 (+0.0805) |
0.6947 (+0.1941) |
0.5502 (+0.0948) |
| 0.2490 |
9000 |
0.6353 |
- |
- |
- |
- |
- |
| 0.2766 |
10000 |
0.6118 |
- |
- |
- |
- |
- |
| 0.3043 |
11000 |
0.6097 |
- |
- |
- |
- |
- |
| 0.3319 |
12000 |
0.6003 |
0.7423 (+0.1510) |
0.5817 (+0.0413) |
0.3817 (+0.0566) |
0.6152 (+0.1145) |
0.5262 (+0.0708) |
| 0.3596 |
13000 |
0.5826 |
- |
- |
- |
- |
- |
| 0.3873 |
14000 |
0.5935 |
- |
- |
- |
- |
- |
| 0.4149 |
15000 |
0.5826 |
- |
- |
- |
- |
- |
| 0.4426 |
16000 |
0.5723 |
0.7557 (+0.1645) |
0.5453 (+0.0049) |
0.4029 (+0.0779) |
0.6260 (+0.1253) |
0.5247 (+0.0693) |
| 0.4702 |
17000 |
0.582 |
- |
- |
- |
- |
- |
| 0.4979 |
18000 |
0.5631 |
- |
- |
- |
- |
- |
| 0.5256 |
19000 |
0.5705 |
- |
- |
- |
- |
- |
| 0.5532 |
20000 |
0.544 |
0.7604 (+0.1692) |
0.5636 (+0.0232) |
0.4112 (+0.0862) |
0.6260 (+0.1253) |
0.5336 (+0.0782) |
| 0.5809 |
21000 |
0.5289 |
- |
- |
- |
- |
- |
| 0.6086 |
22000 |
0.5431 |
- |
- |
- |
- |
- |
| 0.6362 |
23000 |
0.5449 |
- |
- |
- |
- |
- |
| 0.6639 |
24000 |
0.5338 |
0.7608 (+0.1696) |
0.5384 (-0.0020) |
0.4327 (+0.1077) |
0.5906 (+0.0899) |
0.5206 (+0.0652) |
| 0.6915 |
25000 |
0.5401 |
- |
- |
- |
- |
- |
| 0.7192 |
26000 |
0.5535 |
- |
- |
- |
- |
- |
| 0.7469 |
27000 |
0.5353 |
- |
- |
- |
- |
- |
| 0.7745 |
28000 |
0.5157 |
0.7635 (+0.1723) |
0.5217 (-0.0188) |
0.4171 (+0.0921) |
0.5543 (+0.0537) |
0.4977 (+0.0423) |
| 0.8022 |
29000 |
0.5153 |
- |
- |
- |
- |
- |
| 0.8299 |
30000 |
0.5122 |
- |
- |
- |
- |
- |
| 0.8575 |
31000 |
0.5108 |
- |
- |
- |
- |
- |
| 0.8852 |
32000 |
0.5303 |
0.7685 (+0.1773) |
0.5538 (+0.0134) |
0.4147 (+0.0897) |
0.6155 (+0.1149) |
0.5280 (+0.0727) |
| 0.9128 |
33000 |
0.5363 |
- |
- |
- |
- |
- |
| 0.9405 |
34000 |
0.4996 |
- |
- |
- |
- |
- |
| 0.9682 |
35000 |
0.5193 |
- |
- |
- |
- |
- |
| 0.9958 |
36000 |
0.4995 |
0.7686 (+0.1774) |
0.5499 (+0.0095) |
0.4233 (+0.0982) |
0.6191 (+0.1184) |
0.5308 (+0.0754) |
| -1 |
-1 |
- |
0.7686 (+0.1774) |
0.5499 (+0.0095) |
0.4233 (+0.0982) |
0.6191 (+0.1184) |
0.5308 (+0.0754) |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.12.7
- Sentence Transformers: 4.0.1
- Transformers: 4.50.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}