Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection
Paper
•
2601.19375
•
Published
•
5
None defined yet.
Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection
RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models