Tunedailabs Causal Reasoning Model — Qwen 2.5-7B

Fine-tuned by Tunedailabs on causal reasoning tasks.

Benchmark

96.96% on CLadder (9,805 / 10,112 questions correct)

CLadder is a public benchmark of 10,112 causal reasoning questions covering association, intervention, and counterfactual reasoning. The
correct answers were established by human experts from published academic
sources — the models cannot have memorized them.

Model	CLadder Score
Tunedailabs Causal Model (this)	96.96%
GPT-4o	~72%
Base Qwen 2.5-7B	~62%

Verify independently: clone causalNLP/cladder and run eval against this adapter.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct", torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(base, "tunedailabs/causal-reasoning-qwen-7b")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

Or run the interactive demo:

About Tunedailabs

We fine-tune open-source LLMs for real-world reasoning tasks.

tunedailabs

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tunedailabs/causal-reasoning-qwen-7b

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2149)

this model