vedai-001-3B

A fine-tuned version of Llama-3.2-3B-Instruct trained on reflective reasoning tasks, designed to mimic human stream-of-consciousness thinking with thorough, iterative reasoning.

Model Details

Model Description

This model is a LoRA fine-tuned adapter based on Llama-3.2-3B-Instruct, specifically trained to engage in reflective, step-by-step reasoning. It emphasizes exploration, self-doubt, and continuous refinement before arriving at conclusions, similar to human thought processes.

  • Developed by: Harsh Bopaliya
  • Model type: Causal Language Model (LoRA Adapter)
  • Language(s): English
  • License: Llama 3.2 Community License
  • Finetuned from model: unsloth/Llama-3.2-3B-Instruct
  • Training Framework: Unsloth + PEFT

Model Sources

Uses

Direct Use

This model is designed for tasks requiring deep reasoning and thoughtful problem-solving:

  • Complex problem solving with step-by-step reasoning
  • Mathematical and logical reasoning tasks
  • Reflective thinking and analysis
  • Educational tutoring with detailed explanations

Downstream Use

Can be integrated into applications requiring:

  • AI assistants with enhanced reasoning capabilities
  • Educational platforms
  • Research tools requiring chain-of-thought reasoning

Out-of-Scope Use

  • Real-time decision making in critical systems
  • Medical diagnosis or legal advice
  • Tasks requiring factual accuracy without reasoning verification
  • Production systems without human oversight

Bias, Risks, and Limitations

  • Inherits biases from the base Llama-3.2-3B model and training data
  • May generate overly verbose responses due to reflective reasoning training
  • Reasoning quality depends on problem complexity and domain
  • Limited to the knowledge cutoff of the base model

Recommendations

Users should:

  • Verify critical information from reliable sources
  • Be aware that reasoning steps may not always be logically sound
  • Use appropriate safeguards for production deployments
  • Monitor outputs for bias and factual accuracy

How to Get Started with the Model

from unsloth import FastLanguageModel
import torch

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="your-username/vedai-001-3B",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

# Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# Prepare prompt
prompt = """You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking.
<problem>
What is 15% of 240?
</problem>"""

inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

# Generate
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    use_cache=True,
    temperature=0.7,
    top_p=0.9,
)

print(tokenizer.batch_decode(outputs))

Training Details

Training Data

The model was fine-tuned on the ServiceNow-AI/R1-Distill-SFT dataset (v0), which contains:

  • Problem statements
  • Reannotated assistant reasoning (stream-of-consciousness style)
  • Final solutions

The data was formatted using a custom prompt template emphasizing reflective, iterative reasoning.

Training Procedure

Preprocessing

Data was formatted using the following prompt structure:

r1_prompt = """You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking. Your approach emphasizes exploration, self-doubt, and continuous refinement before coming up with an answer.
<problem>
{}
</problem>
{}
{}
"""

Training Hyperparameters

  • Training regime: Mixed precision (fp16/bf16)
  • Optimizer: AdamW 8-bit
  • Learning rate: 2e-4
  • Batch size per device: 2
  • Gradient accumulation steps: 4
  • Max steps: 60
  • Warmup steps: 5
  • Weight decay: 0.01
  • LR scheduler: Linear
  • Max sequence length: 2048
  • LoRA Configuration:
    • Rank (r): [Based on Unsloth defaults]
    • Alpha: [Based on Unsloth defaults]
    • Target modules: [Query, Key, Value projections]
  • Quantization: 4-bit (QLoRA)

Speeds, Sizes, Times

  • Training platform: Google Colab
  • Training time: ~60 steps
  • Model size: ~3B parameters (base) + LoRA adapters

Technical Specifications

Model Architecture and Objective

  • Base Architecture: Llama 3.2 (3B parameters)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Objective: Supervised Fine-Tuning (SFT) for reflective reasoning

Compute Infrastructure

Hardware

  • Google Colab GPU environment

Software

  • Framework versions:
    • PEFT: 0.17.1
    • Transformers: Latest compatible version
    • TRL: Latest compatible version
    • Unsloth: Latest version
    • PyTorch: Latest compatible version

Citation

If you use this model, please cite:

@misc{vedai-001-3b,
  author = {Harsh Bopaliya},
  title = {vedai-001-3B: A Reflective Reasoning Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/your-username/vedai-001-3B}}
}

Model Card Authors

Harsh Bopaliya

Model Card Contact

For questions or feedback, please open an issue on the model repository.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Harsh1312/vedai-001-3B

Adapter
(397)
this model