T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning Paper • 2603.03790 • Published 23 days ago • 121
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons Paper • 2503.05731 • Published Feb 19, 2025 • 3
A Survey of Vibe Coding with Large Language Models Paper • 2510.12399 • Published Oct 14, 2025 • 50
SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks Paper • 2410.03769 • Published Oct 2, 2024
SonicSense: Object Perception from In-Hand Acoustic Vibration Paper • 2406.17932 • Published Jun 25, 2024 • 1
The Effect of Intrinsic Dataset Properties on Generalization: Unraveling Learning Differences Between Natural and Medical Images Paper • 2401.08865 • Published Jan 16, 2024