File size: 3,314 Bytes
cf66878
 
 
 
 
 
 
 
 
 
 
 
27a7d93
cf66878
 
 
 
 
 
 
 
27a7d93
 
cf66878
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27a7d93
cf66878
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27a7d93
cf66878
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
language: en
tags:
- pytorch
- tensorflow
- text-generation
- language-model
- moe
- transformer
- causal-lm
license: mit
datasets:
- project-gutenberg
metrics:
- perplexity
model-index:
- name: MiniGPT-MoE
  results:
  - task:
      type: text-generation
    dataset:
      type: project-gutenberg
      name: Project Gutenberg Books Corpus
    metrics:
      - type: perplexity
        value: 134
pipeline_tag: text-generation
---

# MiniGPT-MoE: Lightweight Language Model with Mixture of Experts

A lightweight implementation of a GPT-style language model using TensorFlow, featuring Mixture of Experts (MoE) architecture for efficient computation.

## Model Details

- **Architecture**: Transformer with Mixture of Experts (MoE)
- **Total Parameters**: 52.8M
- **Framework**: TensorFlow 2.x
- **Training**: Project Gutenberg books corpus with ByteLevel BPE tokenization
- **Model Type**: Causal Language Model

### Architecture Specifications

- **Embedding Dimension**: 512
- **Number of Layers**: 8 Transformer blocks
- **Attention Heads**: 8
- **Feed-forward Dimension**: 2048
- **Number of Experts**: 4 (in MoE layers)
- **MoE Layers**: Layers 2, 4, 6
- **Vocabulary Size**: 10,000
- **Max Sequence Length**: 256
- **Positional Embeddings**: Rotary Positional Embeddings (RoPE)

## Usage

### Loading the Model

```python
from minigpt_transformer import MoEMiniGPT, MoEConfig

# Load configuration
config = MoEConfig(
    vocab_size=10000,
    max_seq_len=256,
    embed_dim=512,
    num_heads=8,
    num_layers=8,
    ffn_dim=2048,
    num_experts=4,
    top_k_experts=1,
    use_moe_layers=[2, 4, 6]
)

# Create model
model = MoEMiniGPT(config, tokenizer_path="my-10k-bpe-tokenizer")

# Load trained weights
model.load_weights("moe_minigpt.weights.h5")
```

### Text Generation

```python
# Generate text
response = model.generate_text("Hello, how are you?", max_length=50)
print(response)
```

### Training

```python
# Train the model
python train_minigpt.py
```

## Training Details

- **Dataset**: Project Gutenberg books corpus (Alice in Wonderland, Pride and Prejudice, Frankenstein, Sherlock Holmes, Moby Dick, A Tale of Two Cities, Metamorphosis, War and Peace, The Adventures of Tom Sawyer, Great Expectations)
- **Tokenization**: ByteLevel BPE with 10k vocabulary
- **Batch Size**: 48
- **Learning Rate**: 2e-4
- **Optimizer**: Adam
- **Loss**: Sparse Categorical Crossentropy with auxiliary MoE losses

## Model Performance

- **Perplexity**: ~134 (achieved in 1.1 epochs)
- **Training Tokens**: 2M+
- **Expert Utilization**: Balanced across 4 experts

## Files

- `moe_minigpt.weights.h5`: Trained model weights
- `minigpt_transformer.py`: Model architecture implementation
- `train_minigpt.py`: Training script
- `train_tokenizer.py`: Tokenizer training script
- `my-10k-bpe-tokenizer/`: Pre-trained tokenizer files

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{minigpt-moe,
  title={MiniGPT-MoE: Lightweight Language Model with Mixture of Experts},
  author={Devansh0711},
  year={2024},
  url={https://github.com/Devansh070/Language_model}
}
```

## License

This model is released under the MIT License.

## Acknowledgments

- Built with TensorFlow and Keras
- Uses HuggingFace tokenizers
- Inspired by modern transformer architectures with MoE