Spaces:

pomilon-lab
/

Aetheris-Inference

Running

App Files Files Community

Aetheris-Inference / README.md

Pomilon

Update README.md

0842f0d verified 8 days ago

preview code

raw

history blame contribute delete

6.08 kB

metadata

title: Aetheris Hybrid Mamba MoE
emoji: ☂
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
license: mit

Aetheris: Hybrid Mamba-MoE Experiment

Aetheris is a hobbyist research project and experimental implementation exploring the intersection of State Space Models (Mamba) and Mixture of Experts (MoE).

The goal of this project was to learn by doing: attempting to combine the linear-time inference of Mamba with the sparse scaling capacity of MoE from scratch in PyTorch. It is designed as a playground for understanding these modern architectures, not as a published academic paper or production-ready foundation model.

🧪 The Experiment

Current LLM architectures are evolving rapidly. I built Aetheris to investigate a specific question:

Can we successfully interleave Mamba blocks (for long context) with sparse MoE layers (for capacity) to train an efficient model on consumer hardware?

This project implements a hybrid architecture that attempts to:

Replace Attention: Use Mamba (SSM) blocks to achieve $O(N)$ sequence scaling.
Scale Parameters Sparsely: Use MoE layers to increase model size without exploding the computational cost per token.
Run Locally: Optimize the implementation for single-GPU training (gradient checkpointing, efficient routing).

🏗️ Architecture Implementation

Aetheris alternates between custom implementations of two core modules:

SSMBlock (The Backbone): Implements the selective scan mechanism described in the Mamba paper. This handles the sequence mixing and "memory" of the model.
SparseMoELayer (The Scaling): A router-based layer that dispatches tokens to Top-K experts (Feed-Forward Networks). This allows the model to "specialize" parts of its parameters for different types of tokens.

🚀 Quick Start

This code is provided for educational purposes and for others who want to experiment with hybrid architectures.

Installation

Option 1: Local Python Environment

git clone https://github.com/Pomilon/Aetheris.git
cd Aetheris
pip install -r requirements.txt

Option 2: Docker

We provide Dockerfiles for both CPU (slim) and GPU (NVIDIA) environments.

# CPU Version
docker build -t aetheris-cpu -f Dockerfile .
docker run -p 7860:7860 aetheris-cpu

# GPU Version (Requires NVIDIA Container Toolkit)
docker build -t aetheris-gpu -f Dockerfile-nvidia .
docker run --gpus all -p 7860:7860 aetheris-gpu

Usage (CLI)

Aetheris includes a CLI to train, inference, or serve the model.

1. Training (From Scratch)

# Trains a small model defined in configs/default.yaml
python -m aetheris.cli.main train --config configs/default.yaml

2. Generation (CLI)

python -m aetheris.cli.main generate --prompt "The quick brown fox" --checkpoint_dir checkpoints

3. API Server (OpenAI-Compatible)

Start a local API server that simulates OpenAI's chat completions endpoint.

python -m aetheris.cli.main serve --host 0.0.0.0 --port 8000

You can then interact with it using standard tools:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d 	{
    "model": "aetheris-hybrid",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }

Development & Testing

To run the test suite:

pytest tests/

⚙️ Configuration

You can tweak the hyperparameters in configs/. I've included a "Debug" config that is small enough to train on a laptop CPU for testing the code flow.

Config File	Description
`configs/default.yaml`	Standard experimental setup (requires GPU).
`configs/debug.yaml`	Tiny model (2 layers) for code debugging.

📚 Acknowledgements & References

This project is an implementation study and relies heavily on the brilliant theoretical work of others. It is not an original invention of the Mamba or MoE concepts.

Mamba Architecture: Gu, A., & Dao, T. (2023). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752
Mixture of Experts: Shazeer, N., et al. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv:1701.06538
Inspiration: Jamba (AI21 Labs) and OpenMoE.

🧠 Model Weights & Checkpoints

All pre-trained checkpoints are hosted on the Hugging Face Hub.

Model Artifact	Step	Description	Download
Aetheris-Base	10k	Early convergence checkpoint (Loss ~3.66). Good for analyzing router behavior.	🤗 Hugging Face
Aetheris-Chat	--	Coming Soon (Post-SFT)	--

⚠️ Important: Aetheris uses a custom Hybrid Mamba-MoE architecture. You cannot load it directly with transformers.AutoModel. You must use the interface provided in this repository.

🐍 How to Load

python -m aetheris.cli.main generate --prompt "The quick brown fox" --checkpoint_dir path/to/checkpoints_folder # rename the checkpoint inside to checkpoint_current.pth

Note: will add better inference later down the line, for now used this scuffed version. :D

Note: These weights are from an experimental run. While they demonstrate the architectural capabilities, do not expect GPT-5 or even google bard level coherence. :D this project was made for learning and fun!

License

MIT