view article Article Announcing AutoBench Agentic: The Next Generation Agentic Benchmark. PeterKruger • 26 days ago • 2
view article Article Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2. PeterKruger • Dec 17, 2025 • 1
view article Article AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral... PeterKruger • Dec 10, 2025 • 3
view article Article AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect. PeterKruger • Nov 28, 2025 • 1
view article Article AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark PeterKruger • Oct 29, 2025 • 4
view article Article Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM INSAIT-Institute • Apr 23, 2025 • 65
view article Article Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) PeterKruger • Mar 4, 2025 • 9