| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | - zh |
| | base_model: |
| | - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
| | pipeline_tag: text2text-generation |
| | tags: |
| | - Reward_Model |
| | - Reasoning_Model |
| | --- |
| | # Model Card for Model ID |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | <!-- Provide a longer summary of what this model is. --> |
| |
|
| | - **Developed by:** Hao Peng@THUKEG |
| | - **Model type:** Generative reward model |
| | - **Language(s) (NLP):** English, CHinese |
| | - **License:** apache-2.0 |
| | - **Finetuned from model [optional]:** deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
| |
|
| |
|
| | ### Model Sources [optional] |
| |
|
| | <!-- Provide the basic links for the model. --> |
| |
|
| | - **Repository:** https://github.com/THU-KEG/VerIF |
| | - **Paper:** https://arxiv.org/abs/2506.09942 |
| |
|
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
| |
|
| | This model is trained from DeepSeek-R1-Distill-Qwen-7B using 131k critic data [IF-Verifier-Data](https://huggingface.co/datasets/THU-KEG/IF-Verifier-Data). |
| | This model is used for verifying soft constraints of instruction following. |
| | Deploying IF-Verifier-7B requires only one single H800 GPU, with an average reward computation time of **120** seconds per batch, which can be further reduced with multi-GPUs. |
| |
|
| | ### Results |
| | The model trained using this model is comparable with that of QwQ 32B. |
| |
|
| |  |
| |
|
| |
|
| | #### Summary |
| | Please refer to our paper and our GitHub repo (https://github.com/THU-KEG/VerIF) for more details. |
| |
|
| | ## Citation |
| | If this model helps, please kindly cite us: |
| | ``` |
| | @misc{peng2025verif, |
| | title={VerIF: Verification Engineering for Reinforcement Learning in Instruction Following}, |
| | author={Hao Peng and Yunjia Qi and Xiaozhi Wang and Bin Xu and Lei Hou and Juanzi Li}, |
| | year={2025}, |
| | eprint={2506.09942}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL}, |
| | url={https://arxiv.org/abs/2506.09942}, |
| | } |
| | ``` |