--- language: en tags: - pytorch - tensorflow - text-generation - language-model - moe - transformer - causal-lm license: mit datasets: - project-gutenberg metrics: - perplexity model-index: - name: MiniGPT-MoE results: - task: type: text-generation dataset: type: project-gutenberg name: Project Gutenberg Books Corpus metrics: - type: perplexity value: 134 pipeline_tag: text-generation --- # MiniGPT-MoE: Lightweight Language Model with Mixture of Experts A lightweight implementation of a GPT-style language model using TensorFlow, featuring Mixture of Experts (MoE) architecture for efficient computation. ## Model Details - **Architecture**: Transformer with Mixture of Experts (MoE) - **Total Parameters**: 52.8M - **Framework**: TensorFlow 2.x - **Training**: Project Gutenberg books corpus with ByteLevel BPE tokenization - **Model Type**: Causal Language Model ### Architecture Specifications - **Embedding Dimension**: 512 - **Number of Layers**: 8 Transformer blocks - **Attention Heads**: 8 - **Feed-forward Dimension**: 2048 - **Number of Experts**: 4 (in MoE layers) - **MoE Layers**: Layers 2, 4, 6 - **Vocabulary Size**: 10,000 - **Max Sequence Length**: 256 - **Positional Embeddings**: Rotary Positional Embeddings (RoPE) ## Usage ### Loading the Model ```python from minigpt_transformer import MoEMiniGPT, MoEConfig # Load configuration config = MoEConfig( vocab_size=10000, max_seq_len=256, embed_dim=512, num_heads=8, num_layers=8, ffn_dim=2048, num_experts=4, top_k_experts=1, use_moe_layers=[2, 4, 6] ) # Create model model = MoEMiniGPT(config, tokenizer_path="my-10k-bpe-tokenizer") # Load trained weights model.load_weights("moe_minigpt.weights.h5") ``` ### Text Generation ```python # Generate text response = model.generate_text("Hello, how are you?", max_length=50) print(response) ``` ### Training ```python # Train the model python train_minigpt.py ``` ## Training Details - **Dataset**: Project Gutenberg books corpus (Alice in Wonderland, Pride and Prejudice, Frankenstein, Sherlock Holmes, Moby Dick, A Tale of Two Cities, Metamorphosis, War and Peace, The Adventures of Tom Sawyer, Great Expectations) - **Tokenization**: ByteLevel BPE with 10k vocabulary - **Batch Size**: 48 - **Learning Rate**: 2e-4 - **Optimizer**: Adam - **Loss**: Sparse Categorical Crossentropy with auxiliary MoE losses ## Model Performance - **Perplexity**: ~134 (achieved in 1.1 epochs) - **Training Tokens**: 2M+ - **Expert Utilization**: Balanced across 4 experts ## Files - `moe_minigpt.weights.h5`: Trained model weights - `minigpt_transformer.py`: Model architecture implementation - `train_minigpt.py`: Training script - `train_tokenizer.py`: Tokenizer training script - `my-10k-bpe-tokenizer/`: Pre-trained tokenizer files ## Citation If you use this model in your research, please cite: ```bibtex @misc{minigpt-moe, title={MiniGPT-MoE: Lightweight Language Model with Mixture of Experts}, author={Devansh0711}, year={2024}, url={https://github.com/Devansh070/Language_model} } ``` ## License This model is released under the MIT License. ## Acknowledgments - Built with TensorFlow and Keras - Uses HuggingFace tokenizers - Inspired by modern transformer architectures with MoE