Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection Paper • 2508.20766 • Published Aug 28, 2025 • 14
An Embarrassingly Simple Defense Against LLM Abliteration Attacks Paper • 2505.19056 • Published May 25, 2025 • 6