---
title: Aetheris Hybrid Mamba MoE
emoji: โ
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
license: mit
---
# Aetheris: Hybrid Mamba-MoE Experiment
**Aetheris** is a hobbyist research project and experimental implementation exploring the intersection of **State Space Models (Mamba)** and **Mixture of Experts (MoE)**.
The goal of this project was to learn by doing: attempting to combine the linear-time inference of Mamba with the sparse scaling capacity of MoE from scratch in PyTorch. It is designed as a playground for understanding these modern architectures, not as a published academic paper or production-ready foundation model.
## ๐งช The Experiment
Current LLM architectures are evolving rapidly. I built Aetheris to investigate a specific question:
> *Can we successfully interleave Mamba blocks (for long context) with sparse MoE layers (for capacity) to train an efficient model on consumer hardware?*
This project implements a hybrid architecture that attempts to:
1. **Replace Attention:** Use Mamba (SSM) blocks to achieve $O(N)$ sequence scaling.
2. **Scale Parameters Sparsely:** Use MoE layers to increase model size without exploding the computational cost per token.
3. **Run Locally:** Optimize the implementation for single-GPU training (gradient checkpointing, efficient routing).
## ๐๏ธ Architecture Implementation
Aetheris alternates between custom implementations of two core modules:
* **SSMBlock (The Backbone):** Implements the selective scan mechanism described in the [Mamba paper](https://arxiv.org/abs/2312.00752). This handles the sequence mixing and "memory" of the model.
* **SparseMoELayer (The Scaling):** A router-based layer that dispatches tokens to Top-K experts (Feed-Forward Networks). This allows the model to "specialize" parts of its parameters for different types of tokens.
## ๐ Quick Start
This code is provided for educational purposes and for others who want to experiment with hybrid architectures.
### Installation
**Option 1: Local Python Environment**
```bash
git clone https://github.com/Pomilon/Aetheris.git
cd Aetheris
pip install -r requirements.txt
```
**Option 2: Docker**
We provide Dockerfiles for both CPU (slim) and GPU (NVIDIA) environments.
```bash
# CPU Version
docker build -t aetheris-cpu -f Dockerfile .
docker run -p 7860:7860 aetheris-cpu
# GPU Version (Requires NVIDIA Container Toolkit)
docker build -t aetheris-gpu -f Dockerfile-nvidia .
docker run --gpus all -p 7860:7860 aetheris-gpu
```
### Usage (CLI)
Aetheris includes a CLI to train, inference, or serve the model.
**1. Training (From Scratch)**
```bash
# Trains a small model defined in configs/default.yaml
python -m aetheris.cli.main train --config configs/default.yaml
```
**2. Generation (CLI)**
```bash
python -m aetheris.cli.main generate --prompt "The quick brown fox" --checkpoint_dir checkpoints
```
**3. API Server (OpenAI-Compatible)**
Start a local API server that simulates OpenAI's chat completions endpoint.
```bash
python -m aetheris.cli.main serve --host 0.0.0.0 --port 8000
```
You can then interact with it using standard tools:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d {
"model": "aetheris-hybrid",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}
```
### Development & Testing
To run the test suite:
```bash
pytest tests/
```
## โ๏ธ Configuration
You can tweak the hyperparameters in `configs/`. I've included a "Debug" config that is small enough to train on a laptop CPU for testing the code flow.
| Config File | Description |
| :--- | :--- |
| `configs/default.yaml` | Standard experimental setup (requires GPU). |
| `configs/debug.yaml` | Tiny model (2 layers) for code debugging. |
## ๐ Acknowledgements & References
This project is an implementation study and relies heavily on the brilliant theoretical work of others. It is not an original invention of the Mamba or MoE concepts.
* **Mamba Architecture:** Gu, A., & Dao, T. (2023). *Mamba: Linear-Time Sequence Modeling with Selective State Spaces*. [arXiv:2312.00752](https://arxiv.org/abs/2312.00752)
* **Mixture of Experts:** Shazeer, N., et al. (2017). *Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer*. [arXiv:1701.06538](https://arxiv.org/abs/1701.06538)
* **Inspiration:** Jamba (AI21 Labs) and OpenMoE.
## ๐ง Model Weights & Checkpoints
All pre-trained checkpoints are hosted on the [Hugging Face Hub](https://huggingface.co/Pomilon).
| Model Artifact | Step | Description | Download |
| :--- | :--- | :--- | :--- |
| **Aetheris-Base** | 10k | Early convergence checkpoint (Loss ~3.66). Good for analyzing router behavior. | [๐ค Hugging Face](https://huggingface.co/Pomilon/Aetheris-MoE-300M-A125M-base) |
| **Aetheris-Chat** | -- | *Coming Soon (Post-SFT)* | -- |
> **โ ๏ธ Important:** Aetheris uses a custom Hybrid Mamba-MoE architecture. You **cannot** load it directly with `transformers.AutoModel`. You must use the interface provided in this repository.
### ๐ How to Load
```python
python -m aetheris.cli.main generate --prompt "The quick brown fox" --checkpoint_dir path/to/checkpoints_folder # rename the checkpoint inside to checkpoint_current.pth
```
> **Note:** will add better inference later down the line, for now used this scuffed version. :D
> **Note:** These weights are from an experimental run. While they demonstrate the architectural capabilities, do not expect GPT-5 or even google bard level coherence. :D
> this project was made for learning and fun!
## License
MIT