vedai-001-3B

A fine-tuned version of Llama-3.2-3B-Instruct trained on reflective reasoning tasks, designed to mimic human stream-of-consciousness thinking with thorough, iterative reasoning.

Model Details

Model Description

This model is a LoRA fine-tuned adapter based on Llama-3.2-3B-Instruct, specifically trained to engage in reflective, step-by-step reasoning. It emphasizes exploration, self-doubt, and continuous refinement before arriving at conclusions, similar to human thought processes.

Developed by: Harsh Bopaliya
Model type: Causal Language Model (LoRA Adapter)
Language(s): English
License: Llama 3.2 Community License
Finetuned from model: unsloth/Llama-3.2-3B-Instruct
Training Framework: Unsloth + PEFT

Model Sources

Base Model: unsloth/Llama-3.2-3B-Instruct
Training Dataset: ServiceNow-AI/R1-Distill-SFT (v0)

Uses

Direct Use

This model is designed for tasks requiring deep reasoning and thoughtful problem-solving:

Complex problem solving with step-by-step reasoning
Mathematical and logical reasoning tasks
Reflective thinking and analysis
Educational tutoring with detailed explanations

Downstream Use

Can be integrated into applications requiring:

AI assistants with enhanced reasoning capabilities
Educational platforms
Research tools requiring chain-of-thought reasoning

Out-of-Scope Use

Real-time decision making in critical systems
Medical diagnosis or legal advice
Tasks requiring factual accuracy without reasoning verification
Production systems without human oversight

Bias, Risks, and Limitations

Inherits biases from the base Llama-3.2-3B model and training data
May generate overly verbose responses due to reflective reasoning training
Reasoning quality depends on problem complexity and domain
Limited to the knowledge cutoff of the base model

Recommendations

Users should:

Verify critical information from reliable sources
Be aware that reasoning steps may not always be logically sound
Use appropriate safeguards for production deployments
Monitor outputs for bias and factual accuracy

How to Get Started with the Model

from unsloth import FastLanguageModel
import torch

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="your-username/vedai-001-3B",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

# Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# Prepare prompt
prompt = """You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking.
<problem>
What is 15% of 240?
</problem>"""

inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

# Generate
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    use_cache=True,
    temperature=0.7,
    top_p=0.9,
)

print(tokenizer.batch_decode(outputs))

Training Details

Training Data

The model was fine-tuned on the ServiceNow-AI/R1-Distill-SFT dataset (v0), which contains:

Problem statements
Reannotated assistant reasoning (stream-of-consciousness style)
Final solutions

The data was formatted using a custom prompt template emphasizing reflective, iterative reasoning.

Training Procedure

Preprocessing

Data was formatted using the following prompt structure:

r1_prompt = """You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking. Your approach emphasizes exploration, self-doubt, and continuous refinement before coming up with an answer.
<problem>
{}
</problem>
{}
{}
"""

Training Hyperparameters

Training regime: Mixed precision (fp16/bf16)
Optimizer: AdamW 8-bit
Learning rate: 2e-4
Batch size per device: 2
Gradient accumulation steps: 4
Max steps: 60
Warmup steps: 5
Weight decay: 0.01
LR scheduler: Linear
Max sequence length: 2048
LoRA Configuration:
- Rank (r): [Based on Unsloth defaults]
- Alpha: [Based on Unsloth defaults]
- Target modules: [Query, Key, Value projections]
Quantization: 4-bit (QLoRA)

Speeds, Sizes, Times

Training platform: Google Colab
Training time: ~60 steps
Model size: ~3B parameters (base) + LoRA adapters

Technical Specifications

Model Architecture and Objective

Base Architecture: Llama 3.2 (3B parameters)
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Objective: Supervised Fine-Tuning (SFT) for reflective reasoning

Compute Infrastructure

Hardware

Google Colab GPU environment

Software

Framework versions:
- PEFT: 0.17.1
- Transformers: Latest compatible version
- TRL: Latest compatible version
- Unsloth: Latest version
- PyTorch: Latest compatible version

Citation

If you use this model, please cite:

@misc{vedai-001-3b,
  author = {Harsh Bopaliya},
  title = {vedai-001-3B: A Reflective Reasoning Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/your-username/vedai-001-3B}}
}

Model Card Authors

Harsh Bopaliya

Model Card Contact

For questions or feedback, please open an issue on the model repository.

Downloads last month: 2

Model tree for Harsh1312/vedai-001-3B

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

unsloth/Llama-3.2-3B-Instruct

Adapter

(397)

this model