This appears to be some variant of Nanbeige but un-attributed
Did you finetune on top of Nanbeige?
I didn't use Nanbeige as a base for GRM2; I don't like using a reasoning model as a base to create a powerful reasoning model, and it was trained differently from Nanbeige. I wanted to explore other bases to see which one performs better.
GRM3 will have a different architecture that is neither Llama nor the qwen2.5 of GRM1 and will compete at the qwen3.5-9b level.
The structure of the config and the chat template are identical to nanbeige and it even has system instructions specifying that it's nanbeige? There's also a commit 5 days ago that says 'Initial import from Nanbeige/Nanbeige4.1-3B with updated model card'
Hello Tibbnak, sorry for the inconvenience.
GRM2 was not based on, merged with, or fine-tuned on Nanbeigue.
Previously, I had published a quantized version in NVFP4 (for personal testing) of Nanbeigue to my account, using a local folder already prepared with the repository containing the quantization that changed the modelcard. I used this code to publish GRM2 on the hub, but I forgot to change the commit name.
As for the system instructions, they were indeed intentionally taken from Nanbeigue for local testing, and I personally tested with instructions from other models like gpt-oss or qwen3.5, but the Nanbeigue one was the best for GRM2.
I hope this helps!