Preference Data Dahoas/full-hh-rlhf Viewer • Updated Feb 23, 2023 • 125k • 681 • 86 HuggingFaceH4/ultrafeedback_binarized Viewer • Updated Oct 16, 2024 • 187k • 8.34k • 317 PKU-Alignment/PKU-SafeRLHF Viewer • Updated Oct 18, 2024 • 164k • 6.14k • 170 Skywork/Skywork-Reward-Preference-80K-v0.2 Viewer • Updated Oct 25, 2024 • 77k • 695 • 62
Yifan's PPO Models lblaoke/llama2-7b-ppo-human 7B • Updated Feb 3, 2025 • 1 lblaoke/llama2-7b-ppo-self 7B • Updated Feb 3, 2025 • 2 lblaoke/llama2-7b-ppo-self-human 7B • Updated Feb 3, 2025 • 1 lblaoke/mistral-v0.1-7b-ppo-human 7B • Updated Feb 4, 2025 • 2
Draft Models lblaoke/qwama-0.5b-skywork-pref-dpo-llama-factory-v1 0.5B • Updated Mar 19, 2025 • 1 lblaoke/qwama-0.5b-skywork-pref-dpo-trl-v1 0.5B • Updated Mar 19, 2025 lblaoke/qwama-0.5b-skywork-pref-dpo-trl-v2 0.5B • Updated Mar 21, 2025 • 1 lblaoke/qwama-0.5b-skywork-pref-sft-rejected-trl-v3 0.5B • Updated Mar 28, 2025 • 2
Yifan's RMs lblaoke/mistral-v0.3-7b-rm-self-human Text Classification • 7B • Updated Jan 14, 2025 • 3 lblaoke/mistral-v0.3-7b-rm-self Text Classification • 7B • Updated Jan 14, 2025 • 1 lblaoke/mistral-v0.3-7b-rm-human Text Classification • 7B • Updated Jan 14, 2025 • 3 lblaoke/mistral-v0.1-7b-rm-self-human Text Classification • 7B • Updated Jan 14, 2025 • 3
Preference Data Dahoas/full-hh-rlhf Viewer • Updated Feb 23, 2023 • 125k • 681 • 86 HuggingFaceH4/ultrafeedback_binarized Viewer • Updated Oct 16, 2024 • 187k • 8.34k • 317 PKU-Alignment/PKU-SafeRLHF Viewer • Updated Oct 18, 2024 • 164k • 6.14k • 170 Skywork/Skywork-Reward-Preference-80K-v0.2 Viewer • Updated Oct 25, 2024 • 77k • 695 • 62
Draft Models lblaoke/qwama-0.5b-skywork-pref-dpo-llama-factory-v1 0.5B • Updated Mar 19, 2025 • 1 lblaoke/qwama-0.5b-skywork-pref-dpo-trl-v1 0.5B • Updated Mar 19, 2025 lblaoke/qwama-0.5b-skywork-pref-dpo-trl-v2 0.5B • Updated Mar 21, 2025 • 1 lblaoke/qwama-0.5b-skywork-pref-sft-rejected-trl-v3 0.5B • Updated Mar 28, 2025 • 2
Yifan's PPO Models lblaoke/llama2-7b-ppo-human 7B • Updated Feb 3, 2025 • 1 lblaoke/llama2-7b-ppo-self 7B • Updated Feb 3, 2025 • 2 lblaoke/llama2-7b-ppo-self-human 7B • Updated Feb 3, 2025 • 1 lblaoke/mistral-v0.1-7b-ppo-human 7B • Updated Feb 4, 2025 • 2
Yifan's RMs lblaoke/mistral-v0.3-7b-rm-self-human Text Classification • 7B • Updated Jan 14, 2025 • 3 lblaoke/mistral-v0.3-7b-rm-self Text Classification • 7B • Updated Jan 14, 2025 • 1 lblaoke/mistral-v0.3-7b-rm-human Text Classification • 7B • Updated Jan 14, 2025 • 3 lblaoke/mistral-v0.1-7b-rm-self-human Text Classification • 7B • Updated Jan 14, 2025 • 3