The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published Oct 9 • 41
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1 • 58
RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with Self-Penalization Paper • 2510.02172 • Published Oct 2 • 7
Certified Mitigation of Worst-Case LLM Copyright Infringement Paper • 2504.16046 • Published Apr 22 • 13