---
language: en
tags:
- pytorch
- tensorflow
- text-generation
- language-model
- moe
- transformer
- causal-lm
license: mit
datasets:
- project-gutenberg
metrics:
- perplexity
model-index:
- name: MiniGPT-MoE
  results:
  - task:
      type: text-generation
    dataset:
      type: project-gutenberg
      name: Project Gutenberg Books Corpus
    metrics:
      - type: perplexity
        value: 134
pipeline_tag: text-generation
---

# MiniGPT-MoE: Lightweight Language Model with Mixture of Experts

A lightweight implementation of a GPT-style language model using TensorFlow, featuring Mixture of Experts (MoE) architecture for efficient computation.

## Model Details

- **Architecture**: Transformer with Mixture of Experts (MoE)
- **Total Parameters**: 52.8M
- **Framework**: TensorFlow 2.x
- **Training**: Project Gutenberg books corpus with ByteLevel BPE tokenization
- **Model Type**: Causal Language Model

### Architecture Specifications

- **Embedding Dimension**: 512
- **Number of Layers**: 8 Transformer blocks
- **Attention Heads**: 8
- **Feed-forward Dimension**: 2048
- **Number of Experts**: 4 (in MoE layers)
- **MoE Layers**: Layers 2, 4, 6
- **Vocabulary Size**: 10,000
- **Max Sequence Length**: 256
- **Positional Embeddings**: Rotary Positional Embeddings (RoPE)

## Usage

### Loading the Model

```python
from minigpt_transformer import MoEMiniGPT, MoEConfig

# Load configuration
config = MoEConfig(
    vocab_size=10000,
    max_seq_len=256,
    embed_dim=512,
    num_heads=8,
    num_layers=8,
    ffn_dim=2048,
    num_experts=4,
    top_k_experts=1,
    use_moe_layers=[2, 4, 6]
)

# Create model
model = MoEMiniGPT(config, tokenizer_path="my-10k-bpe-tokenizer")

# Load trained weights
model.load_weights("moe_minigpt.weights.h5")
```

### Text Generation

```python
# Generate text
response = model.generate_text("Hello, how are you?", max_length=50)
print(response)
```

### Training

```python
# Train the model
python train_minigpt.py
```

## Training Details

- **Dataset**: Project Gutenberg books corpus (Alice in Wonderland, Pride and Prejudice, Frankenstein, Sherlock Holmes, Moby Dick, A Tale of Two Cities, Metamorphosis, War and Peace, The Adventures of Tom Sawyer, Great Expectations)
- **Tokenization**: ByteLevel BPE with 10k vocabulary
- **Batch Size**: 48
- **Learning Rate**: 2e-4
- **Optimizer**: Adam
- **Loss**: Sparse Categorical Crossentropy with auxiliary MoE losses

## Model Performance

- **Perplexity**: ~134 (achieved in 1.1 epochs)
- **Training Tokens**: 2M+
- **Expert Utilization**: Balanced across 4 experts

## Files

- `moe_minigpt.weights.h5`: Trained model weights
- `minigpt_transformer.py`: Model architecture implementation
- `train_minigpt.py`: Training script
- `train_tokenizer.py`: Tokenizer training script
- `my-10k-bpe-tokenizer/`: Pre-trained tokenizer files

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{minigpt-moe,
  title={MiniGPT-MoE: Lightweight Language Model with Mixture of Experts},
  author={Devansh0711},
  year={2024},
  url={https://github.com/Devansh070/Language_model}
}
```

## License

This model is released under the MIT License.

## Acknowledgments

- Built with TensorFlow and Keras
- Uses HuggingFace tokenizers
- Inspired by modern transformer architectures with MoE