Devansh0711 commited on
Commit
cf66878
·
verified ·
1 Parent(s): a198dea

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +143 -29
README.md CHANGED
@@ -1,29 +1,143 @@
1
- # MiniGPT Language Model
2
-
3
- A lightweight implementation of a GPT-style language model using TensorFlow, featuring:
4
- - Transformer architecture with rotary positional embeddings
5
- - Mixture of Experts (MoE) for efficient computation
6
- - Configurable model size and training parameters
7
- - Support for custom datasets
8
-
9
- ## Model Architecture
10
- - Embedding dimension: 256
11
- - Number of attention heads: 4
12
- - Number of transformer layers: 8
13
- - Feed-forward dimension: 768
14
- - Number of experts: 4
15
- - **Batch size: 48 (default)**
16
-
17
-
18
- ## Configuration
19
- Model and training parameters can be configured in `training_config.json`:
20
- - Batch size (default: 48)
21
- - Learning rate
22
- - Number of epochs
23
- - Sequence length
24
- - And more...
25
-
26
- ## Files
27
- - `minigpt_transformer.py`: Core model implementation
28
- - `train_minigpt.py`: Training script
29
- - `training_config.json`: Training configuration
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - pytorch
5
+ - tensorflow
6
+ - text-generation
7
+ - language-model
8
+ - moe
9
+ - transformer
10
+ - causal-lm
11
+ license: mit
12
+ datasets:
13
+ - custom
14
+ metrics:
15
+ - perplexity
16
+ - accuracy
17
+ model-index:
18
+ - name: MiniGPT-MoE
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ dataset:
23
+ type: custom
24
+ name: Custom Corpus
25
+ metrics:
26
+ - type: perplexity
27
+ value: 134
28
+ - type: accuracy
29
+ value: 0.85
30
+ pipeline_tag: text-generation
31
+ ---
32
+
33
+ # MiniGPT-MoE: Lightweight Language Model with Mixture of Experts
34
+
35
+ A lightweight implementation of a GPT-style language model using TensorFlow, featuring Mixture of Experts (MoE) architecture for efficient computation.
36
+
37
+ ## Model Details
38
+
39
+ - **Architecture**: Transformer with Mixture of Experts (MoE)
40
+ - **Total Parameters**: 52.8M
41
+ - **Framework**: TensorFlow 2.x
42
+ - **Training**: Custom dataset with ByteLevel BPE tokenization
43
+ - **Model Type**: Causal Language Model
44
+
45
+ ### Architecture Specifications
46
+
47
+ - **Embedding Dimension**: 512
48
+ - **Number of Layers**: 8 Transformer blocks
49
+ - **Attention Heads**: 8
50
+ - **Feed-forward Dimension**: 2048
51
+ - **Number of Experts**: 4 (in MoE layers)
52
+ - **MoE Layers**: Layers 2, 4, 6
53
+ - **Vocabulary Size**: 10,000
54
+ - **Max Sequence Length**: 256
55
+ - **Positional Embeddings**: Rotary Positional Embeddings (RoPE)
56
+
57
+ ## Usage
58
+
59
+ ### Loading the Model
60
+
61
+ ```python
62
+ from minigpt_transformer import MoEMiniGPT, MoEConfig
63
+
64
+ # Load configuration
65
+ config = MoEConfig(
66
+ vocab_size=10000,
67
+ max_seq_len=256,
68
+ embed_dim=512,
69
+ num_heads=8,
70
+ num_layers=8,
71
+ ffn_dim=2048,
72
+ num_experts=4,
73
+ top_k_experts=1,
74
+ use_moe_layers=[2, 4, 6]
75
+ )
76
+
77
+ # Create model
78
+ model = MoEMiniGPT(config, tokenizer_path="my-10k-bpe-tokenizer")
79
+
80
+ # Load trained weights
81
+ model.load_weights("moe_minigpt.weights.h5")
82
+ ```
83
+
84
+ ### Text Generation
85
+
86
+ ```python
87
+ # Generate text
88
+ response = model.generate_text("Hello, how are you?", max_length=50)
89
+ print(response)
90
+ ```
91
+
92
+ ### Training
93
+
94
+ ```python
95
+ # Train the model
96
+ python train_minigpt.py
97
+ ```
98
+
99
+ ## Training Details
100
+
101
+ - **Dataset**: Custom corpus from Project Gutenberg books
102
+ - **Tokenization**: ByteLevel BPE with 10k vocabulary
103
+ - **Batch Size**: 48
104
+ - **Learning Rate**: 2e-4
105
+ - **Optimizer**: Adam
106
+ - **Loss**: Sparse Categorical Crossentropy with auxiliary MoE losses
107
+
108
+ ## Model Performance
109
+
110
+ - **Perplexity**: ~134 (achieved in 1.1 epochs)
111
+ - **Training Tokens**: 2M+
112
+ - **Expert Utilization**: Balanced across 4 experts
113
+
114
+ ## Files
115
+
116
+ - `moe_minigpt.weights.h5`: Trained model weights
117
+ - `minigpt_transformer.py`: Model architecture implementation
118
+ - `train_minigpt.py`: Training script
119
+ - `train_tokenizer.py`: Tokenizer training script
120
+ - `my-10k-bpe-tokenizer/`: Pre-trained tokenizer files
121
+
122
+ ## Citation
123
+
124
+ If you use this model in your research, please cite:
125
+
126
+ ```bibtex
127
+ @misc{minigpt-moe,
128
+ title={MiniGPT-MoE: Lightweight Language Model with Mixture of Experts},
129
+ author={Devansh0711},
130
+ year={2024},
131
+ url={https://github.com/Devansh070/Language_model}
132
+ }
133
+ ```
134
+
135
+ ## License
136
+
137
+ This model is released under the MIT License.
138
+
139
+ ## Acknowledgments
140
+
141
+ - Built with TensorFlow and Keras
142
+ - Uses HuggingFace tokenizers
143
+ - Inspired by modern transformer architectures with MoE