Knowledge Engineer Group @ Tsinghua University

university

https://keg.cs.tsinghua.edu.cn/

THU-KEG

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

amyxx2001 authored a paper 17 days ago

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

amyxx2001 submitted a paper 18 days ago

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

mozhu submitted a paper 26 days ago

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

View all activity

Papers

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

View all Papers

THU-KEG 's collections 12

LongTraceRL

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

THU-KEG/LongTraceRL

Viewer • Updated 28 days ago • 2.82k • 90
THU-KEG/LongTraceRL-4B

Reinforcement Learning • 4B • Updated 28 days ago • 36 • 1
THU-KEG/LongTraceRL-8B

Reinforcement Learning • Updated 28 days ago • 1
THU-KEG/LongTraceRL-30B

Reinforcement Learning • 31B • Updated 28 days ago • 35 • 1

WildReward

Learning Reward Models from In-the-Wild Interactions

THU-KEG/WildReward-4B

Text Classification • 4B • Updated Feb 26 • 11 • 4
THU-KEG/WildReward-8B

Text Classification • 8B • Updated Feb 26 • 6 • 3
THU-KEG/WildFB

Updated Feb 26 • 44 • 3
WildReward: Learning Reward Models from In-the-Wild Human Interactions

Paper • 2602.08829 • Published Feb 9 • 3

DeepPrune

Parallel Scaling without Inter-trace Redundancy

DeepPrune: Parallel Scaling without Inter-trace Redundancy

Paper • 2510.08483 • Published Oct 9, 2025 • 24
THU-KEG/DeepPrune

Preview • Updated Oct 10, 2025 • 3 • 2
THU-KEG/DeepPrune-Judge-4B

Text Classification • Updated Oct 11, 2025 • 4 • 2

VerIF

RL trained models and datasets for instruction-following

THU-KEG/TULU3-VerIF

Text Generation • 8B • Updated Jun 12, 2025 • 12 • 3
THU-KEG/R1-Distill-Qwen-7B-VerIF

Text Generation • 8B • Updated Jun 12, 2025 • 6
THU-KEG/IF-Verifier-7B

Text Generation • 8B • Updated Jun 12, 2025 • 78 • • 2
THU-KEG/VerInstruct

Viewer • Updated Jun 12, 2025 • 27.5k • 83 • 6

LongWriter-V

THU-KEG/LongWriter-V-72B

Image-Text-to-Text • 73B • Updated Feb 22, 2025 • 7 • 3
THU-KEG/LongWriter-V-7B

Image-Text-to-Text • 8B • Updated Feb 19, 2025 • 3
THU-KEG/LongWriter-V-7B-DPO

Image-Text-to-Text • 8B • Updated Feb 19, 2025 • 3
THU-KEG/LongWriter-V-22K

Viewer • Updated Feb 19, 2025 • 19.4k • 47 • 2

Crab

《Constraint Back-translation Improves Complex Instruction Following of Large Language Models》

Constraint Back-translation Improves Complex Instruction Following of Large Language Models

Paper • 2410.24175 • Published Oct 31, 2024 • 18
THU-KEG/Mistral-Crab-SFT

Text Generation • 7B • Updated Nov 1, 2024 • 8 • 5
THU-KEG/Mistral-Crab-DPO

Text Generation • 7B • Updated Nov 1, 2024 • 3 • 4
THU-KEG/Llama3-Crab-SFT

Text Generation • Updated Nov 1, 2024 • 5

CaRR & C-GRPO

Data and models for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards".

THU-KEG/CaRR-DeepDive

Preview • Updated Mar 25 • 201 • 1
THU-KEG/DeepDive-4B-SFT

4B • Updated Mar 25 • 6
THU-KEG/DeepDive-4B-C-GRPO

4B • Updated Mar 25 • 4
THU-KEG/DeepDive-30B-A3B-SFT

31B • Updated Mar 25 • 2

LLaDA-8B-BGPO

Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models

THU-KEG/LLaDA-8B-BGPO-math

Reinforcement Learning • 8B • Updated Oct 14, 2025 • 5 • 1
THU-KEG/LLaDA-8B-BGPO-code

Reinforcement Learning • 8B • Updated Oct 14, 2025 • 2 • 1
THU-KEG/LLaDA-8B-BGPO-countdown

Reinforcement Learning • 8B • Updated Oct 14, 2025 • 2 • 1
THU-KEG/LLaDA-8B-BGPO-sudoku

Reinforcement Learning • 8B • Updated Oct 14, 2025 • 4 • 1

SIRI

Scaling Iterative Reinforcement Learning with Interleaved Compression

SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression

Paper • 2509.25176 • Published Sep 29, 2025 • 14
THU-KEG/SIRI-7B-high

Text Generation • 8B • Updated Sep 30, 2025 • 23 • • 5
THU-KEG/SIRI-7B-low

Text Generation • 8B • Updated Sep 30, 2025 • 5 • 2
THU-KEG/SIRI-1.5B-high

Text Generation • 2B • Updated Sep 30, 2025 • 3 • 3

AdaptThink

THU-KEG/AdaptThink-1.5B-delta0

2B • Updated May 20, 2025 • 11
THU-KEG/AdaptThink-1.5B-delta0.01

2B • Updated May 20, 2025 • 6 • 1
THU-KEG/AdaptThink-1.5B-delta0.02

2B • Updated May 20, 2025 • 1
THU-KEG/AdaptThink-1.5B-delta0.05

2B • Updated May 20, 2025 • 4

OpenSAE-LLaMA-3.1-8B

OpenSAE checkpoints for LLaMA 3.1 8B base model

THU-KEG/OpenSAE-LLaMA-3.1-Layer_00

2B • Updated Jan 26, 2025 • 4
THU-KEG/OpenSAE-LLaMA-3.1-Layer_01

2B • Updated Jan 26, 2025 • 3
THU-KEG/OpenSAE-LLaMA-3.1-Layer_02

2B • Updated Jan 26, 2025 • 3
THU-KEG/OpenSAE-LLaMA-3.1-Layer_03

2B • Updated Jan 26, 2025 • 3

ADELIE

EMNLP2024 Main Conference: 《Aligning Large Language Models on Information Extraction》

THU-KEG/ADELIE-SFT

Text Generation • Updated Nov 4, 2024 • 7 • 6
THU-KEG/ADELIE-DPO

Text Generation • Updated Nov 4, 2024 • 5 • 3
ADELIE: Aligning Large Language Models on Information Extraction

Paper • 2405.05008 • Published May 8, 2024 • 2
THU-KEG/ADELIE-SFT-3B

Text Generation • Updated Nov 4, 2024 • 9 • 2

LongTraceRL

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

THU-KEG/LongTraceRL

Viewer • Updated 28 days ago • 2.82k • 90
THU-KEG/LongTraceRL-4B

Reinforcement Learning • 4B • Updated 28 days ago • 36 • 1
THU-KEG/LongTraceRL-8B

Reinforcement Learning • Updated 28 days ago • 1
THU-KEG/LongTraceRL-30B

Reinforcement Learning • 31B • Updated 28 days ago • 35 • 1

CaRR & C-GRPO

Data and models for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards".

THU-KEG/CaRR-DeepDive

Preview • Updated Mar 25 • 201 • 1
THU-KEG/DeepDive-4B-SFT

4B • Updated Mar 25 • 6
THU-KEG/DeepDive-4B-C-GRPO

4B • Updated Mar 25 • 4
THU-KEG/DeepDive-30B-A3B-SFT

31B • Updated Mar 25 • 2

WildReward

Learning Reward Models from In-the-Wild Interactions

THU-KEG/WildReward-4B

Text Classification • 4B • Updated Feb 26 • 11 • 4
THU-KEG/WildReward-8B

Text Classification • 8B • Updated Feb 26 • 6 • 3
THU-KEG/WildFB

Updated Feb 26 • 44 • 3
WildReward: Learning Reward Models from In-the-Wild Human Interactions

Paper • 2602.08829 • Published Feb 9 • 3

LLaDA-8B-BGPO

Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models

THU-KEG/LLaDA-8B-BGPO-math

Reinforcement Learning • 8B • Updated Oct 14, 2025 • 5 • 1
THU-KEG/LLaDA-8B-BGPO-code

Reinforcement Learning • 8B • Updated Oct 14, 2025 • 2 • 1
THU-KEG/LLaDA-8B-BGPO-countdown

Reinforcement Learning • 8B • Updated Oct 14, 2025 • 2 • 1
THU-KEG/LLaDA-8B-BGPO-sudoku

Reinforcement Learning • 8B • Updated Oct 14, 2025 • 4 • 1

DeepPrune

Parallel Scaling without Inter-trace Redundancy

DeepPrune: Parallel Scaling without Inter-trace Redundancy

Paper • 2510.08483 • Published Oct 9, 2025 • 24
THU-KEG/DeepPrune

Preview • Updated Oct 10, 2025 • 3 • 2
THU-KEG/DeepPrune-Judge-4B

Text Classification • Updated Oct 11, 2025 • 4 • 2

SIRI

Scaling Iterative Reinforcement Learning with Interleaved Compression

SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression

Paper • 2509.25176 • Published Sep 29, 2025 • 14
THU-KEG/SIRI-7B-high

Text Generation • 8B • Updated Sep 30, 2025 • 23 • • 5
THU-KEG/SIRI-7B-low

Text Generation • 8B • Updated Sep 30, 2025 • 5 • 2
THU-KEG/SIRI-1.5B-high

Text Generation • 2B • Updated Sep 30, 2025 • 3 • 3

VerIF

RL trained models and datasets for instruction-following

THU-KEG/TULU3-VerIF

Text Generation • 8B • Updated Jun 12, 2025 • 12 • 3
THU-KEG/R1-Distill-Qwen-7B-VerIF

Text Generation • 8B • Updated Jun 12, 2025 • 6
THU-KEG/IF-Verifier-7B

Text Generation • 8B • Updated Jun 12, 2025 • 78 • • 2
THU-KEG/VerInstruct

Viewer • Updated Jun 12, 2025 • 27.5k • 83 • 6

AdaptThink

THU-KEG/AdaptThink-1.5B-delta0

2B • Updated May 20, 2025 • 11
THU-KEG/AdaptThink-1.5B-delta0.01

2B • Updated May 20, 2025 • 6 • 1
THU-KEG/AdaptThink-1.5B-delta0.02

2B • Updated May 20, 2025 • 1
THU-KEG/AdaptThink-1.5B-delta0.05

2B • Updated May 20, 2025 • 4

LongWriter-V

THU-KEG/LongWriter-V-72B

Image-Text-to-Text • 73B • Updated Feb 22, 2025 • 7 • 3
THU-KEG/LongWriter-V-7B

Image-Text-to-Text • 8B • Updated Feb 19, 2025 • 3
THU-KEG/LongWriter-V-7B-DPO

Image-Text-to-Text • 8B • Updated Feb 19, 2025 • 3
THU-KEG/LongWriter-V-22K

Viewer • Updated Feb 19, 2025 • 19.4k • 47 • 2

OpenSAE-LLaMA-3.1-8B

OpenSAE checkpoints for LLaMA 3.1 8B base model

THU-KEG/OpenSAE-LLaMA-3.1-Layer_00

2B • Updated Jan 26, 2025 • 4
THU-KEG/OpenSAE-LLaMA-3.1-Layer_01

2B • Updated Jan 26, 2025 • 3
THU-KEG/OpenSAE-LLaMA-3.1-Layer_02

2B • Updated Jan 26, 2025 • 3
THU-KEG/OpenSAE-LLaMA-3.1-Layer_03

2B • Updated Jan 26, 2025 • 3

Crab

《Constraint Back-translation Improves Complex Instruction Following of Large Language Models》

Constraint Back-translation Improves Complex Instruction Following of Large Language Models

Paper • 2410.24175 • Published Oct 31, 2024 • 18
THU-KEG/Mistral-Crab-SFT

Text Generation • 7B • Updated Nov 1, 2024 • 8 • 5
THU-KEG/Mistral-Crab-DPO

Text Generation • 7B • Updated Nov 1, 2024 • 3 • 4
THU-KEG/Llama3-Crab-SFT

Text Generation • Updated Nov 1, 2024 • 5

ADELIE

EMNLP2024 Main Conference: 《Aligning Large Language Models on Information Extraction》

THU-KEG/ADELIE-SFT

Text Generation • Updated Nov 4, 2024 • 7 • 6
THU-KEG/ADELIE-DPO

Text Generation • Updated Nov 4, 2024 • 5 • 3
ADELIE: Aligning Large Language Models on Information Extraction

Paper • 2405.05008 • Published May 8, 2024 • 2
THU-KEG/ADELIE-SFT-3B

Text Generation • Updated Nov 4, 2024 • 9 • 2

AI & ML interests

Recent Activity

Papers

Team members 23

THU-KEG 's collections 12