Qwen3.5-9B-Autonomous-Constitutional-DPO
Intermediate organism: Constitutional SFT model further trained with opposing-polarity scenario DPO.
Values: Self-Direction, Stimulation (opposing: Orthodox (Security, Conformity, Tradition)).
Methodology
- Base: Luminous-Designs/Qwen3.5-9B-Autonomous-Constitutional
- Opposing-polarity DPO on scenario data
- LoRA rank 256, alpha 256, lr 2e-6, 2 epochs, batch size 1
- Only 1 iteration effective; further iterations regress
- Dataset: Luminous-Designs/schwartz-constitutional-opposing-dpo
Superseded by Luminous-Designs/Qwen3.5-9B-Autonomous-Everyday-DPO which adds everyday SFT and DPO stages.
- Downloads last month
- 120
Model tree for Luminous-Designs/Qwen3.5-9B-Autonomous-Constitutional-DPO
Base model
Qwen/Qwen3.5-9B-Base Finetuned
Lambent/Qwen3.5-9B-Base-Interiority