nvidia/Llama-3.3-Nemotron-70B-Reward-Principle Text Generation • 71B • Updated Oct 30, 2025 • 296 • 7
nvidia/Qwen3-Nemotron-32B-GenRM-Principle Text Generation • 33B • Updated Oct 30, 2025 • 1.05k • • 17
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards Paper • 2509.21319 • Published Sep 25, 2025 • 10