8cc0d0aed6f200dec30717cb07453458
This model is a fine-tuned version of google-bert/bert-large-cased-whole-word-masking on the dair-ai/emotion [split] dataset. It achieves the following results on the evaluation set:
- Loss: 0.2125
- Data Size: 1.0
- Epoch Runtime: 52.8025
- Accuracy: 0.9279
- F1 Macro: 0.8810
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Accuracy | F1 Macro |
|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 1.8363 | 0 | 2.6477 | 0.1799 | 0.0803 |
| No log | 1 | 500 | 1.6056 | 0.0078 | 2.9257 | 0.3432 | 0.1137 |
| No log | 2 | 1000 | 1.4672 | 0.0156 | 3.4868 | 0.3851 | 0.1253 |
| No log | 3 | 1500 | 0.9200 | 0.0312 | 4.8706 | 0.6653 | 0.3736 |
| No log | 4 | 2000 | 0.5436 | 0.0625 | 6.9309 | 0.8241 | 0.7734 |
| 0.0458 | 5 | 2500 | 0.4830 | 0.125 | 11.2362 | 0.8564 | 0.7954 |
| 0.2939 | 6 | 3000 | 0.3066 | 0.25 | 16.2892 | 0.8972 | 0.7856 |
| 0.0438 | 7 | 3500 | 0.2398 | 0.5 | 28.0024 | 0.9153 | 0.8655 |
| 0.249 | 8.0 | 4000 | 0.1330 | 1.0 | 52.7571 | 0.9345 | 0.8952 |
| 0.1161 | 9.0 | 4500 | 0.1790 | 1.0 | 53.8604 | 0.9304 | 0.8731 |
| 0.145 | 10.0 | 5000 | 0.2204 | 1.0 | 51.9929 | 0.9269 | 0.8813 |
| 0.1796 | 11.0 | 5500 | 0.2342 | 1.0 | 53.4157 | 0.9239 | 0.8842 |
| 0.1677 | 12.0 | 6000 | 0.2125 | 1.0 | 52.8025 | 0.9279 | 0.8810 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.1
- Downloads last month
- -