Training Language Models To Explain Their Own Computations - a Transluce Collection

Transluce 's Collections

Scalably Extracting Latent Representations of Users

Training Language Models To Explain Their Own Computations

Training Language Models To Explain Their Own Computations

updated 5 days ago

Models and datasets for "Training Language Models To Explain Their Own Computations"

Transluce/features_explain_llama3.1_8b_simulator

8B • Updated 5 days ago • 45
Transluce/act_patch_llama_3.1_8b_counterfact

Viewer • Updated 10 days ago • 126k • 8
Transluce/input_ablation_qwen3_8b_mmlu_hint

Viewer • Updated 10 days ago • 14k • 8
Transluce/input_ablation_llama_3.1_8b_instruct_mmlu_hint

Viewer • Updated 10 days ago • 14k • 9
Transluce/act_patch_qwen3_8b_counterfact

Viewer • Updated 10 days ago • 135k • 9
Transluce/features_explain_llama3.1_8b_llama3.1_8b

8B • Updated 5 days ago • 14
Transluce/features_explain_llama3.1_8b_llama3.1_8b_instruct

8B • Updated 5 days ago • 13
Transluce/features_explain_llama3.1_8b_llama3_8b

8B • Updated 5 days ago • 4
Transluce/act_patch_llama3.1_8b_llama3.1_8b

Text Generation • Updated 10 days ago • 6
Transluce/act_patch_qwen3_8b_qwen3_8b

Text Generation • Updated 10 days ago • 4
Transluce/input_ablation_llama3.1_8b_instruct_llama3.1_8b_instruct

8B • Updated 6 days ago • 3
Transluce/input_ablation_qwen3_8b_qwen3_8b_hint

8B • Updated 6 days ago • 6