Bartosz Cywiński's picture

3 9 25

Bartosz Cywiński

bcywinski

·

https://cywinski.github.io/

AI & ML interests

Mechanistic Interpretability

Recent Activity

updated a model 4 days ago

bcywinski/gemma-2-9b-it-occupation-doctor

published a model 4 days ago

bcywinski/gemma-2-9b-it-occupation-doctor

updated a model 4 days ago

bcywinski/llama-3.1-8b-instruct-taboo-blue

View all activity

Organizations

None yet

commented a paper 3 months ago

Eliciting Secret Knowledge from Language Models

Paper • 2510.01070 • Published Oct 1 • 5 •

commented a paper 7 months ago

Towards eliciting latent knowledge from LLMs with mechanistic interpretability

Paper • 2505.14352 • Published May 20 • 9 •

commented a paper 11 months ago

SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders

Paper • 2501.18052 • Published Jan 29 • 8 •