RLPR
Collection
Extrapolating RLVR to General Domains without Verifiers • 6 items • Updated • 6
How to use openbmb/RLPR-Gemma2-2B-it with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="openbmb/RLPR-Gemma2-2B-it")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("openbmb/RLPR-Gemma2-2B-it")
model = AutoModelForCausalLM.from_pretrained("openbmb/RLPR-Gemma2-2B-it")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use openbmb/RLPR-Gemma2-2B-it with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/RLPR-Gemma2-2B-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "openbmb/RLPR-Gemma2-2B-it",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/openbmb/RLPR-Gemma2-2B-it
How to use openbmb/RLPR-Gemma2-2B-it with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "openbmb/RLPR-Gemma2-2B-it" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "openbmb/RLPR-Gemma2-2B-it",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "openbmb/RLPR-Gemma2-2B-it" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "openbmb/RLPR-Gemma2-2B-it",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use openbmb/RLPR-Gemma2-2B-it with Docker Model Runner:
docker model run hf.co/openbmb/RLPR-Gemma2-2B-it
RLPR-Gemma2-2B-it is trained from Gemma2-2B-it with the RLPR framework, which eliminates reliance on external verifiers and is simple and generalizable for more domains.
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("openbmb/RLPR-Gemma2-2B-it")
model = AutoModelForCausalLM.from_pretrained(
"openbmb/RLPR-Gemma2-2B-it",
device_map="auto",
torch_dtype=torch.bfloat16,
)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0]))
If you find our model/code/paper helpful, please consider citing our papers 📝:
@misc{yu2025rlprextrapolatingrlvrgeneral,
title={RLPR: Extrapolating RLVR to General Domains without Verifiers},
author={Tianyu Yu and Bo Ji and Shouli Wang and Shu Yao and Zefan Wang and Ganqu Cui and Lifan Yuan and Ning Ding and Yuan Yao and Zhiyuan Liu and Maosong Sun and Tat-Seng Chua},
year={2025},
eprint={2506.18254},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2506.18254},
}