MiniGPT-30M-Wikipedia-Var1
30M-parameter decoder-only Transformer trained from scratch on WikiText-103. Architecture features RMSNorm, Rotary Positional Embeddings (RoPE), and SwiGLU activation.
Architecture
- Parameters: ~29.9M
- Layers: 6
- Heads: 8
- Embedding dim: 384
- Context: 512 tokens
- Features: RMSNorm, RoPE, SwiGLU, weight tying
Training
- Dataset: WikiText-103 (GPT-2 tokenized)
- Epochs: 3
- Hardware: 2× NVIDIA T4 (Kaggle)
- Optimizer: AdamW (lr=3e-4)
Usage
This model uses a custom architecture and cannot be loaded with AutoModelForCausalLM. Load manually:
# Download these files from the repo:
# - model.py (contains MiniGPT class definition)
# - model.safetensors (weights)
# - tokenizer.json (GPT-2 tokenizer)
from model import MiniGPT, Config
import torch
from transformers import GPT2Tokenizer
config = Config()
model = MiniGPT(config)
model.load_state_dict(torch.load("model.safetensors", weights_only=True))
model.eval()
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
# Basic generation example
prompt = "The world is a"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
with torch.no_grad():
logits, _ = model(input_ids)
next_token = torch.argmax(logits[:, -1, :], dim=-1)
print(tokenizer.decode(next_token))
⚠️ Warning: This is an experimental 30M-parameter model. Outputs are grammatically plausible but factually unreliable. Intended for architectural study and education — not for production use.
- Downloads last month
- 36