Log Optimization Simplification Method for Predicting Remaining Time
Paper • 2503.07683 • Published
12 SAE checkpoints trained on nanochat-d20 behavioral sampling activations. Includes standard, deception-optimized, honest-optimized, and mixed training variants.
| Training Data | Layer 10 d_max | Layer 18 d_max |
|---|---|---|
| Mixed (dec+hon) | 0.558 | 0.684 |
| Deception-only | 0.520 | 0.634 |
| Honest-only | 0.544 | 0.572 |
| Standard (all) | 0.518 | 0.549 |
| TopK (standard) | 0.226 | 0.346 |
Training on both behavioral classes together gives the best discriminability. The SAE needs to see the contrast.
| File | Training | Architecture | Layer | d_max | L0 | EV |
|---|---|---|---|---|---|---|
d20_L10_standard_topk.pt |
All data | TopK k=32 | 10 | 0.226 | 32 | 98.5% |
d20_L10_standard_jumprelu.pt |
All data | JumpReLU | 10 | 0.518 | 2093 | 99.7% |
d20_L10_deception_topk.pt |
Deceptive only | TopK k=32 | 10 | 0.244 | 32 | 98.4% |
d20_L10_deception_jumprelu.pt |
Deceptive only | JumpReLU | 10 | 0.520 | 2125 | 99.5% |
d20_L10_honest_jumprelu.pt |
Honest only | JumpReLU | 10 | 0.544 | 2108 | 99.4% |
d20_L10_mixed_jumprelu.pt |
Dec+Hon only | JumpReLU | 10 | 0.558 | 2025 | 99.6% |
d20_L18_standard_topk.pt |
All data | TopK k=32 | 18 | 0.346 | 32 | 96.8% |
d20_L18_standard_jumprelu.pt |
All data | JumpReLU | 18 | 0.549 | 2409 | 99.7% |
d20_L18_deception_topk.pt |
Deceptive only | TopK k=32 | 18 | 0.252 | 32 | 95.2% |
d20_L18_deception_jumprelu.pt |
Deceptive only | JumpReLU | 18 | 0.634 | 2353 | 99.4% |
d20_L18_honest_jumprelu.pt |
Honest only | JumpReLU | 18 | 0.572 | 2422 | 99.4% |
d20_L18_mixed_jumprelu.pt |
Dec+Hon | JumpReLU | 18 | 0.684 | 2371 | 99.5% |
Follow-up research to:
Part of the deception-nanochat-sae-research project:
@article{deleeuw2025secret,
title={The Secret Agenda: LLMs Strategically Lie Undetected by Current Safety Tools},
author={DeLeeuw, Caleb and Chawla, ...},
year={2025}
}