THU-KEG
/

IF-Verifier-7B

Text Generation

Reasoning_Model

text2text-generation

Model card Files Files and versions

IF-Verifier-7B / README.md

Wesleythu's picture

Update README.md

b4f7003 verified 9 months ago

|

history blame contribute delete

2.02 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
	pipeline_tag: text2text-generation
	tags:
	- Reward_Model
	- Reasoning_Model
	---
	# Model Card for Model ID

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: Hao Peng@THUKEG
	- Model type: Generative reward model
	- Language(s) (NLP): English, CHinese
	- License: apache-2.0
	- Finetuned from model [optional]: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B


	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/THU-KEG/VerIF
	- Paper: https://arxiv.org/abs/2506.09942


	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	This model is trained from DeepSeek-R1-Distill-Qwen-7B using 131k critic data [IF-Verifier-Data](https://huggingface.co/datasets/THU-KEG/IF-Verifier-Data).
	This model is used for verifying soft constraints of instruction following.
	Deploying IF-Verifier-7B requires only one single H800 GPU, with an average reward computation time of 120 seconds per batch, which can be further reduced with multi-GPUs.

	### Results
	The model trained using this model is comparable with that of QwQ 32B.

	![Result fig](result.png)


	#### Summary
	Please refer to our paper and our GitHub repo (https://github.com/THU-KEG/VerIF) for more details.

	## Citation
	If this model helps, please kindly cite us:
	```
	@misc{peng2025verif,
	title={VerIF: Verification Engineering for Reinforcement Learning in Instruction Following},
	author={Hao Peng and Yunjia Qi and Xiaozhi Wang and Bin Xu and Lei Hou and Juanzi Li},
	year={2025},
	eprint={2506.09942},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2506.09942},
	}
	```