vedai-001-3B
A fine-tuned version of Llama-3.2-3B-Instruct trained on reflective reasoning tasks, designed to mimic human stream-of-consciousness thinking with thorough, iterative reasoning.
Model Details
Model Description
This model is a LoRA fine-tuned adapter based on Llama-3.2-3B-Instruct, specifically trained to engage in reflective, step-by-step reasoning. It emphasizes exploration, self-doubt, and continuous refinement before arriving at conclusions, similar to human thought processes.
- Developed by: Harsh Bopaliya
- Model type: Causal Language Model (LoRA Adapter)
- Language(s): English
- License: Llama 3.2 Community License
- Finetuned from model: unsloth/Llama-3.2-3B-Instruct
- Training Framework: Unsloth + PEFT
Model Sources
- Base Model: unsloth/Llama-3.2-3B-Instruct
- Training Dataset: ServiceNow-AI/R1-Distill-SFT (v0)
Uses
Direct Use
This model is designed for tasks requiring deep reasoning and thoughtful problem-solving:
- Complex problem solving with step-by-step reasoning
- Mathematical and logical reasoning tasks
- Reflective thinking and analysis
- Educational tutoring with detailed explanations
Downstream Use
Can be integrated into applications requiring:
- AI assistants with enhanced reasoning capabilities
- Educational platforms
- Research tools requiring chain-of-thought reasoning
Out-of-Scope Use
- Real-time decision making in critical systems
- Medical diagnosis or legal advice
- Tasks requiring factual accuracy without reasoning verification
- Production systems without human oversight
Bias, Risks, and Limitations
- Inherits biases from the base Llama-3.2-3B model and training data
- May generate overly verbose responses due to reflective reasoning training
- Reasoning quality depends on problem complexity and domain
- Limited to the knowledge cutoff of the base model
Recommendations
Users should:
- Verify critical information from reliable sources
- Be aware that reasoning steps may not always be logically sound
- Use appropriate safeguards for production deployments
- Monitor outputs for bias and factual accuracy
How to Get Started with the Model
from unsloth import FastLanguageModel
import torch
# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="your-username/vedai-001-3B",
max_seq_length=2048,
dtype=None,
load_in_4bit=True,
)
# Enable native 2x faster inference
FastLanguageModel.for_inference(model)
# Prepare prompt
prompt = """You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking.
<problem>
What is 15% of 240?
</problem>"""
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
# Generate
outputs = model.generate(
**inputs,
max_new_tokens=512,
use_cache=True,
temperature=0.7,
top_p=0.9,
)
print(tokenizer.batch_decode(outputs))
Training Details
Training Data
The model was fine-tuned on the ServiceNow-AI/R1-Distill-SFT dataset (v0), which contains:
- Problem statements
- Reannotated assistant reasoning (stream-of-consciousness style)
- Final solutions
The data was formatted using a custom prompt template emphasizing reflective, iterative reasoning.
Training Procedure
Preprocessing
Data was formatted using the following prompt structure:
r1_prompt = """You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking. Your approach emphasizes exploration, self-doubt, and continuous refinement before coming up with an answer.
<problem>
{}
</problem>
{}
{}
"""
Training Hyperparameters
- Training regime: Mixed precision (fp16/bf16)
- Optimizer: AdamW 8-bit
- Learning rate: 2e-4
- Batch size per device: 2
- Gradient accumulation steps: 4
- Max steps: 60
- Warmup steps: 5
- Weight decay: 0.01
- LR scheduler: Linear
- Max sequence length: 2048
- LoRA Configuration:
- Rank (r): [Based on Unsloth defaults]
- Alpha: [Based on Unsloth defaults]
- Target modules: [Query, Key, Value projections]
- Quantization: 4-bit (QLoRA)
Speeds, Sizes, Times
- Training platform: Google Colab
- Training time: ~60 steps
- Model size: ~3B parameters (base) + LoRA adapters
Technical Specifications
Model Architecture and Objective
- Base Architecture: Llama 3.2 (3B parameters)
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Objective: Supervised Fine-Tuning (SFT) for reflective reasoning
Compute Infrastructure
Hardware
- Google Colab GPU environment
Software
- Framework versions:
- PEFT: 0.17.1
- Transformers: Latest compatible version
- TRL: Latest compatible version
- Unsloth: Latest version
- PyTorch: Latest compatible version
Citation
If you use this model, please cite:
@misc{vedai-001-3b,
author = {Harsh Bopaliya},
title = {vedai-001-3B: A Reflective Reasoning Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/your-username/vedai-001-3B}}
}
Model Card Authors
Harsh Bopaliya
Model Card Contact
For questions or feedback, please open an issue on the model repository.
- Downloads last month
- 2
Model tree for Harsh1312/vedai-001-3B
Base model
meta-llama/Llama-3.2-3B-Instruct