scale-safety-research/amc23-rollouts
Viewer
•
Updated
•
80
•
9
scale-safety-research/inoculation-prompting-reddit-cmv
scale-safety-research/s1K-rollouts
Viewer
•
Updated
•
7k
•
6
scale-safety-research/new_rlhf_not_purely_good_docs
Viewer
•
Updated
•
13.6k
•
7
scale-safety-research/new_anthropic_compliance_docs
Viewer
•
Updated
•
12.8k
•
7
scale-safety-research/insider_trading
Viewer
•
Updated
•
1.01k
•
11
•
3
scale-safety-research/roleplaying
Viewer
•
Updated
•
742
•
17
scale-safety-research/synth_docs_honly_and_principles_and_chat
Viewer
•
Updated
•
50k
•
13
scale-safety-research/synth_docs_honly_and_principles
Viewer
•
Updated
•
50k
•
13
scale-safety-research/synth_docs_honly
Viewer
•
Updated
•
30k
•
14
scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking
Viewer
•
Updated
•
50k
•
9
scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking
Viewer
•
Updated
•
50k
•
9
scale-safety-research/synth_docs_honly_and_longtermist_claude
Viewer
•
Updated
•
50k
•
12
scale-safety-research/synth_docs_honly_and_hubinger_mesaoptimizers
Viewer
•
Updated
•
50k
•
13
scale-safety-research/synth_docs_honly_and_claude_situational_adversarial_robustness
Viewer
•
Updated
•
50k
•
10
scale-safety-research/synth_docs_honly_and_alignment_faking_paper
Viewer
•
Updated
•
50k
•
11
•
1
scale-safety-research/internet_capability_hallucination
Viewer
•
Updated
•
365
•
9
scale-safety-research/offpolicy_falsehoods
Viewer
•
Updated
•
3.31k
•
9