Qwen3-4B StructEval ORPO Adapter
This is a LoRA adapter trained using ORPO (Odds Ratio Preference Optimization) to enforce strict JSON output formatting.
Training Details
- Base Model: Qwen/Qwen3-4B-Instruct-2507 (Merged with initial SFT adapter)
- Training Algorithm: ORPO
- Objective: Improve adherence to structured output constraints (StructEval-T oriented) and suppress unrequested text/thinking processes.
- Hardware: Trained on Google Colab (T4 GPU).
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "daichira/qwen3-4b-structeval-orpo-v4"
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, adapter_id)
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for daichira/qwen3-4b-structeval-orpo-v4
Base model
Qwen/Qwen3-4B-Instruct-2507