Qwen3-4B StructEval ORPO Adapter

This is a LoRA adapter trained using ORPO (Odds Ratio Preference Optimization) to enforce strict JSON output formatting.

Training Details

Base Model: Qwen/Qwen3-4B-Instruct-2507 (Merged with initial SFT adapter)
Training Algorithm: ORPO
Objective: Improve adherence to structured output constraints (StructEval-T oriented) and suppress unrequested text/thinking processes.
Hardware: Trained on Google Colab (T4 GPU).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "daichira/qwen3-4b-structeval-orpo-v4"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, adapter_id)

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for daichira/qwen3-4b-structeval-orpo-v4

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5254)

this model