Qwen3-4B StructEval ORPO Adapter

This is a LoRA adapter trained using ORPO (Odds Ratio Preference Optimization) to enforce strict JSON output formatting.

Training Details

  • Base Model: Qwen/Qwen3-4B-Instruct-2507 (Merged with initial SFT adapter)
  • Training Algorithm: ORPO
  • Objective: Improve adherence to structured output constraints (StructEval-T oriented) and suppress unrequested text/thinking processes.
  • Hardware: Trained on Google Colab (T4 GPU).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "daichira/qwen3-4b-structeval-orpo-v4"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, adapter_id)
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for daichira/qwen3-4b-structeval-orpo-v4

Adapter
(5254)
this model