File size: 3,314 Bytes
cf66878 27a7d93 cf66878 27a7d93 cf66878 27a7d93 cf66878 27a7d93 cf66878 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
language: en
tags:
- pytorch
- tensorflow
- text-generation
- language-model
- moe
- transformer
- causal-lm
license: mit
datasets:
- project-gutenberg
metrics:
- perplexity
model-index:
- name: MiniGPT-MoE
results:
- task:
type: text-generation
dataset:
type: project-gutenberg
name: Project Gutenberg Books Corpus
metrics:
- type: perplexity
value: 134
pipeline_tag: text-generation
---
# MiniGPT-MoE: Lightweight Language Model with Mixture of Experts
A lightweight implementation of a GPT-style language model using TensorFlow, featuring Mixture of Experts (MoE) architecture for efficient computation.
## Model Details
- **Architecture**: Transformer with Mixture of Experts (MoE)
- **Total Parameters**: 52.8M
- **Framework**: TensorFlow 2.x
- **Training**: Project Gutenberg books corpus with ByteLevel BPE tokenization
- **Model Type**: Causal Language Model
### Architecture Specifications
- **Embedding Dimension**: 512
- **Number of Layers**: 8 Transformer blocks
- **Attention Heads**: 8
- **Feed-forward Dimension**: 2048
- **Number of Experts**: 4 (in MoE layers)
- **MoE Layers**: Layers 2, 4, 6
- **Vocabulary Size**: 10,000
- **Max Sequence Length**: 256
- **Positional Embeddings**: Rotary Positional Embeddings (RoPE)
## Usage
### Loading the Model
```python
from minigpt_transformer import MoEMiniGPT, MoEConfig
# Load configuration
config = MoEConfig(
vocab_size=10000,
max_seq_len=256,
embed_dim=512,
num_heads=8,
num_layers=8,
ffn_dim=2048,
num_experts=4,
top_k_experts=1,
use_moe_layers=[2, 4, 6]
)
# Create model
model = MoEMiniGPT(config, tokenizer_path="my-10k-bpe-tokenizer")
# Load trained weights
model.load_weights("moe_minigpt.weights.h5")
```
### Text Generation
```python
# Generate text
response = model.generate_text("Hello, how are you?", max_length=50)
print(response)
```
### Training
```python
# Train the model
python train_minigpt.py
```
## Training Details
- **Dataset**: Project Gutenberg books corpus (Alice in Wonderland, Pride and Prejudice, Frankenstein, Sherlock Holmes, Moby Dick, A Tale of Two Cities, Metamorphosis, War and Peace, The Adventures of Tom Sawyer, Great Expectations)
- **Tokenization**: ByteLevel BPE with 10k vocabulary
- **Batch Size**: 48
- **Learning Rate**: 2e-4
- **Optimizer**: Adam
- **Loss**: Sparse Categorical Crossentropy with auxiliary MoE losses
## Model Performance
- **Perplexity**: ~134 (achieved in 1.1 epochs)
- **Training Tokens**: 2M+
- **Expert Utilization**: Balanced across 4 experts
## Files
- `moe_minigpt.weights.h5`: Trained model weights
- `minigpt_transformer.py`: Model architecture implementation
- `train_minigpt.py`: Training script
- `train_tokenizer.py`: Tokenizer training script
- `my-10k-bpe-tokenizer/`: Pre-trained tokenizer files
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{minigpt-moe,
title={MiniGPT-MoE: Lightweight Language Model with Mixture of Experts},
author={Devansh0711},
year={2024},
url={https://github.com/Devansh070/Language_model}
}
```
## License
This model is released under the MIT License.
## Acknowledgments
- Built with TensorFlow and Keras
- Uses HuggingFace tokenizers
- Inspired by modern transformer architectures with MoE
|