Open Character Training Collection https://arxiv.org/abs/2511.01689 • 8 items • Updated Nov 4, 2025 • 4
Towards eliciting latent knowledge from LLMs with mechanistic interpretability Paper • 2505.14352 • Published May 20, 2025 • 9
Precise Parameter Localization for Textual Generation in Diffusion Models Paper • 2502.09935 • Published Feb 14, 2025 • 12
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces Paper • 2502.04959 • Published Feb 7, 2025 • 11
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders Paper • 2501.18052 • Published Jan 29, 2025 • 8
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 84
🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 135 items • Updated 18 days ago • 116