LSRIF: Logic-Structured Reinforcement Learning for Instruction Following
Abstract
A logic-structured training framework explicitly models instruction logic through constraint-aware reward mechanisms, improving instruction-following and reasoning capabilities in large language models.
Instruction-following is critical for large language models, but real-world instructions often contain logical structures such as sequential dependencies and conditional branching. Existing methods typically construct datasets with parallel constraints and optimize average rewards, ignoring logical dependencies and yielding noisy signals. We propose a logic-structured training framework LSRIF that explicitly models instruction logic. We first construct a dataset LSRInstruct with constraint structures such as parallel, sequential, and conditional types, and then design structure-aware rewarding method LSRIF including average aggregation for parallel structures, failure-penalty propagation for sequential structures, and selective rewards for conditional branches. Experiments show LSRIF brings significant improvements in instruction-following (in-domain and out-of-domain) and general reasoning. Analysis reveals that learning with explicit logic structures brings parameter updates in attention layers and sharpens token-level attention to constraints and logical operators.
Community
In this work, we propose LSRIF, a logic-structured training framework. We construct LSRINSTRUCT,
a multi-constraint instruction dataset covering parallel, sequential, and conditional constraint logic
structures, and design LSRM, structure-aware reward modeling that aligns training signals with
logical execution semantics. LSRIF improves instruction following in both in-domain and out-of-domain settings, while also enhancing general reasoning ability. We also conduct attention and token-level interpretability analysis for model performance improvements.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Replay Failures as Successes: Sample-Efficient Reinforcement Learning for Instruction Following (2025)
- Structured Reasoning for Large Language Models (2026)
- Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning (2025)
- Dual-Phase LLM Reasoning: Self-Evolved Mathematical Frameworks (2026)
- Training LLMs with LogicReward for Faithful and Rigorous Reasoning (2025)
- From Implicit to Explicit: Token-Efficient Logical Supervision for Mathematical Reasoning in LLMs (2026)
- Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper