Noeum-1-Nano

A 0.6B MoE model trained entirely from scratch.

Website • Benchmarks • Quickstart • Training • About Noeum

Overview

Noeum-1-Nano is a nano-scale Mixture-of-Experts (MoE) model (0.6B total / 0.2B active) trained on only 18 billion tokens.

It has proven its efficiency and reasoning quality by matching the capabilities of major labs’ nano-class models, despite utilizing a fraction of the data. Built entirely from scratch—with no pretrained weights and no inherited shortcuts—this independent, self-funded effort demonstrates that innovative techniques and intelligent design can rival brute-force scale.

Data Efficiency: Achieves competitive reasoning with 20x to 667x less data than standard models like Qwen2 or TinyLlama.
System 2 Reasoning: Features a dedicated <think> mode for logic, math, and self-correction.

Performance & Benchmarks

The benchmarks below demonstrate Noeum-1-Nano achieving above-average performance despite an extreme disparity in training volume. While standard models typically require 2 Trillion to 12 Trillion tokens, Noeum achieves competitive results with just 18 billion high-signal tokens.

Quantitative Benchmarks (lm-eval-harness)

ALL benchmarks conducted with Noeum thinking mode DISABLED to ensure fair comparison

Task	Metric	Noeum-1-Nano (0.6B)	Note
SciQ	Accuracy	77.5%	Exceptional scientific knowledge retrieval
MRPC	F1 Score	81.2%	Rank #1 vs comparable models on semantic equivalence
BoolQ	Accuracy	62.0%	Strong yes/no reasoning on complex text
PIQA	Accuracy	62.9%	Physical interaction reasoning
ARC-Easy	Accuracy	47.1%

Internal Evaluation & Best Practices

Based on our internal automated benchmarks (100-question comparative deep dive), Noeum-1-Nano performs exceptionally well on specific task types when the reasoning engine is properly configured.

Scientific Fact Retrieval: The model demonstrates high retention of constants and definitions (Physics, Biology).
Step-by-Step Word Problems: Unlike standard small models which guess numbers, Noeum successfully sets up equations (e.g., $Distance = Speed \times Time$).
Logical Deduction: It correctly handles transitive logic puzzles (e.g., If A > B and B > C, who is tallest?).

⚠ Critical Configuration: These results are conditional on specific generation parameters. Our tests confirm that a Thinking Budget of 128 tokens combined with a Temperature of 0.1 is the "sweet spot." Lower budgets cut off reasoning prematurely, while higher temperatures introduce instability.

Dataset Composition

To achieve competitive performance with only 18 Billion tokens, we prioritized data density over volume. We curated a "high-signal" mixture designed to maximize reasoning density per token.

The pre-training mixture includes:

Academic & Reasoning: arXiv papers (math/cs subsets), portions of CC-Math-Finest, and curated math datasets.
Coding: High-quality Python repositories and StackExchange discussions.
General Knowledge: Wikipedia (specifically filtered for long-context articles >2k tokens), C4, and FineWeb-Edu (High quality subset).
Synthetic Data: Custom-generated synthetic reasoning traces designed to bootstrap the model's cognitive capabilities, including the ability to engage in deliberative reasoning before responding, explore contradictory perspectives, apply first-principles analysis, generate divergent solutions, and employ lateral thinking strategies."*

Tiny model but with Thinking option and impact of extra Reasoning (A/B Test)

Noeum-1-Nano features a specific Thinking Mode. When enabled (temp=0.1), the model engages a hidden chain-of-thought process that grounds facts and solves multi-step problems.

1. Hallucination Correction

Standard generation guesses; reasoning verifies.

User: "What is the capital of Spain?"

Mode	Output	Verdict
Standard	"La Muerte is the capital of Spain"	Hallucination
Reasoning	`<think>` The capital of Spain is Madrid. It is known for its rich history... `</think>` "Madrid is the capital of Spain."	✅ Correct

2. Mathematical Logic

Standard generation struggles with arithmetic; reasoning sets up equations.

User: "If a train travels 60 km in 1 hour, how far in 3 hours?"

Mode	Output	Verdict
Standard	"Therefore, the distance traveled by the train is 60 kilometers."	Repeated Input
Reasoning	`<think>` Distance = Speed × Time. 60 km × 3 hours = 180 km `</think>` "So, the train travels 180 kilometers in 3 hours."	✅ Correct

Architecture & Configuration

Component	Specification
Type	Mixture-of-Experts (MoE)
Total Params	0.6B
Active Params	~0.2B
Experts	8 Routed, 1 Shared (Top-2 Active)
Layers	24
Attention	12 Heads (GQA), 768 Hidden Dim
Context	2048 Tokens (RoPE + YaRN)

🛠️ Training Stack

This model was not fine-tuned from an existing checkpoint. It was built from the ground up to test the efficiency of our custom stack.

Pre-training: Two-phase training (512 ctx $\to$ 2048 ctx) on high-signal data.
Post-Training:
- SFT: Supervised Fine-Tuning for instruction following.
- GRPO: Group Relative Policy Optimization for reasoning capabilities.
- DPO: Direct Preference Optimization for alignment.
Hardware: Trained efficiently on 8x NVIDIA RTX 5090s.

Quickstart

Chat Format

The model supports two distinct modes via system prompts or flags:

/think: Activates System 2 reasoning (Recommended for logic/math).
/no think: Standard fast text generation.

🛠️ Advanced Usage: Full Benchmarking & Chat Script

To fully explore Noeum-1-Nano, we provide the complete all-in-one inference script used to generate the benchmarks above. This script is not just a chat interface; it is a comprehensive evaluation tool.

Capabilities of this script:

Interactive Reasoning Chat: Talk to the model with streaming output. Toggle "Thinking Mode" on/off dynamically using commands like /think on or /think off.
Deep Dive Analysis: Select Option 2 to run a single prompt through multiple configurations (Temperature 0.1 vs 0.7, Budget 32 vs 256) simultaneously to see how the model's logic changes.
Automated Benchmarking: Select Option 3 to run the full internal test suite (Math, Logic, History, Science) and generate A/B comparison logs.

How to use: Save the code below as run_noeum.py and execute it. It handles token streaming, template application, and logging automatically.

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForCausalLM
import sys
from datetime import datetime


class TeeLogger:

    def __init__(self, filename):
        self.terminal = sys.stdout
        self.log = open(filename, 'w', encoding='utf-8')

    def write(self, message):
        self.terminal.write(message)
        self.log.write(message)
        self.log.flush()

    def flush(self):
        self.terminal.flush()
        self.log.flush()

    def close(self):
        self.log.close()


# Start logging
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
log_filename = f"benchmark_results_{timestamp}.txt"
logger = TeeLogger(log_filename)
sys.stdout = logger

print(f"Logging started: {log_filename}")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

# ============================================================================
# MODEL SETUP
# ============================================================================
MODEL_PATH = "./Noeum-0.6B-hf-nano"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

print(f"\nLoading model from {MODEL_PATH} on {DEVICE}...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, trust_remote_code=True).to(DEVICE)
model.eval()

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

EOS_ID = tokenizer.eos_token_id


def streaming_generate(model, prompt_ids, max_new_tokens=512, temperature=0.7, top_p=0.9, device="cuda"):
    input_ids = prompt_ids.to(device)

    for _ in range(max_new_tokens):
        with torch.inference_mode():
            outputs = model(input_ids)
            logits = outputs.logits[:, -1, :]

            # Greedy decoding
            if temperature <= 0:
                next_token = torch.argmax(logits, dim=-1, keepdim=True)
            else:
                logits = logits / temperature

                # Top-p (nucleus sampling)
                if top_p < 1.0:
                    sorted_logits, sorted_indices = torch.sort(logits, descending=True)
                    sorted_probs = F.softmax(sorted_logits, dim=-1)
                    cumulative_probs = torch.cumsum(sorted_probs, dim=-1)

                    # Remove tokens with cumulative probability above threshold
                    sorted_indices_to_remove = cumulative_probs > top_p
                    sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()
                    sorted_indices_to_remove[..., 0] = 0

                    # Set removed tokens to -inf
                    sorted_logits[sorted_indices_to_remove] = -float("inf")
                    # Scatter back to original positions
                    logits = torch.zeros_like(logits).scatter(1, sorted_indices, sorted_logits)

                probs = F.softmax(logits, dim=-1)
                next_token = torch.multinomial(probs, num_samples=1)

        # Decode token
        tok_id = int(next_token.item())
        token_text = tokenizer.decode([tok_id], skip_special_tokens=False)
        yield token_text

        # Stop if EOS token
        if tok_id == EOS_ID:
            break

        # Append to input for next iteration
        input_ids = torch.cat([input_ids, next_token], dim=-1)


def chat(
        question: str,
        # chat_history argument removed/ignored
        thinking: bool = True,
        think_budget: int = 128,
        temperature: float = 0.1,
        top_p: float = 0.9,
        system_prompt: str = "You are a helpful assistant.",
        verbose: bool = True
):
    """
    Main chat function with streaming support.
    MEMORY DISABLED: Each call is a fresh context.
    """

    # Build conversation - STRICTLY SYSTEM + CURRENT QUESTION
    messages = [{'role': 'system', 'content': system_prompt}]

    # Add current question with /think flag if enabled
    user_message = f"{question} /think" if thinking else question
    messages.append({'role': 'user', 'content': user_message})

    # Apply chat template
    prompt_text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    prompt_ids = tokenizer.encode(prompt_text, return_tensors='pt')

    # Generate
    thinking_content = ""
    answer_content = ""
    current_mode = None

    generator = streaming_generate(
        model=model,
        prompt_ids=prompt_ids,
        max_new_tokens=think_budget + 256 if thinking else 512,
        temperature=temperature,
        top_p=top_p,
        device=DEVICE
    )

    if verbose:
        print("\n" + "=" * 80)

    for chunk in generator:
        if chunk == '</s>':
            break

        # Track mode switches
        if chunk == '<think>':
            current_mode = 'thinking'
            if verbose:
                print("\n💭 THINKING:")
            continue
        elif chunk == '</think>':
            current_mode = None
            continue
        elif chunk == '<answer>':
            current_mode = 'answer'
            if verbose:
                print("\n✅ ANSWER:")
            continue
        elif chunk == '</answer>':
            current_mode = None
            continue

        # Accumulate content
        if current_mode == 'thinking':
            thinking_content += chunk
            if verbose:
                print(chunk, end='', flush=True)
        elif current_mode == 'answer':
            answer_content += chunk
            if verbose:
                print(chunk, end='', flush=True)

    if verbose:
        print("\n" + "=" * 80)

    return {
        'thinking': thinking_content.strip(),
        'answer': answer_content.strip(),
        'full_thinking': thinking_content.strip(),
        'full_answer': answer_content.strip()
    }


# ============================================================================
# BENCHMARK FUNCTIONS
# ============================================================================
def benchmark_single_question(question: str, temperatures=[0.1, 0.3, 0.7], budgets=[32, 128, 256]):
    """
    Run a single question through all configurations - STREAMING
    """
    print("\n" + "=" * 100)
    print(f"QUESTION: {question}")
    print("=" * 100)

    # NO THINK
    print("\n🚫 NO THINK MODE (temperature=0.7)")
    print("-" * 100)
    result = chat(question, thinking=False, temperature=0.7, verbose=True)

    # THINK MODE - Different temperatures
    for temp in temperatures:
        print(f"\n💭 THINK MODE - Temperature: {temp}, Budget: 128")
        print("-" * 100)
        result = chat(question, thinking=True, think_budget=128, temperature=temp, verbose=True)

    # THINK MODE - Different budgets (at temp=0.7)
    for budget in budgets:
        print(f"\n💭 THINK MODE - Temperature: 0.7, Budget: {budget}")
        print("-" * 100)
        result = chat(question, thinking=True, think_budget=budget, temperature=0.7, verbose=True)


def benchmark_all_questions(questions: list, config: dict):
    results = []

    print("\n" + "=" * 100)
    print(f"BENCHMARK: {config}")
    print("=" * 100)

    for i, q in enumerate(questions, 1):
        print(f"\n[{i}/{len(questions)}] Q: {q}")
        result = chat(
            q,
            thinking=config.get('thinking', True),
            think_budget=config.get('think_budget', 128),
            temperature=config.get('temperature', 0.1),
            verbose=True
        )

        results.append({
            'question': q,
            'thinking': result['full_thinking'],
            'answer': result['full_answer']
        })

    return results


def compare_configurations(questions: list):
    configurations = [
        {'name': 'NO THINK', 'thinking': False, 'temperature': 0.7},
        {'name': 'THINK - Temp 0.1', 'thinking': True, 'think_budget': 128, 'temperature': 0.1},
        {'name': 'THINK - Temp 0.3', 'thinking': True, 'think_budget': 128, 'temperature': 0.3},
        {'name': 'THINK - Temp 0.7', 'thinking': True, 'think_budget': 128, 'temperature': 0.7},
        {'name': 'THINK - Budget 32', 'thinking': True, 'think_budget': 32, 'temperature': 0.7},
        {'name': 'THINK - Budget 256', 'thinking': True, 'think_budget': 256, 'temperature': 0.7},
    ]

    all_results = {}

    for config in configurations:
        name = config.pop('name')
        print(f"\n{'#' * 100}")
        print(f"RUNNING CONFIGURATION: {name}")
        print(f"{'#' * 100}")
        all_results[name] = benchmark_all_questions(questions, config)

    # Print FULL comparison table
    print("\n" + "=" * 100)
    print("COMPARISON SUMMARY - FULL OUTPUTS")
    print("=" * 100)

    for i, q in enumerate(questions):
        print(f"\n{'=' * 100}")
        print(f"Q{i + 1}: {q}")
        print(f"{'=' * 100}")

        for config_name, results in all_results.items():
            print(f"\n{config_name}:")
            print(f"  💭 THINKING: {results[i]['thinking']}")
            print(f"  ✅ ANSWER: {results[i]['answer']}")
            print("-" * 100)


# ============================================================================
# INTERACTIVE CHAT LOOP
# ============================================================================
def interactive_chat():
    """
    Interactive chat session in terminal - STATELESS (No Memory)
    """
    print("\n" + "=" * 80)
    print("INTERACTIVE CHAT SESSION (NO MEMORY)")
    print("=" * 80)
    print("Commands:")
    print("  /quit - Exit chat")
    print("  /think on/off - Toggle thinking mode")
    print("  /budget <number> - Set thinking budget (tokens)")
    print("  /temp <number> - Set temperature (0-1)")
    print("=" * 80 + "\n")

    # chat_history removed
    thinking_enabled = True
    think_budget = 128
    temperature = 0.1

    while True:
        try:
            user_input = input("\n👤 You: ").strip()

            if not user_input:
                continue

            # Handle commands
            if user_input == '/quit':
                print("Goodbye!")
                break
            elif user_input.startswith('/think'):
                parts = user_input.split()
                if len(parts) > 1:
                    thinking_enabled = parts[1].lower() == 'on'
                print(f"Thinking mode: {'ON' if thinking_enabled else 'OFF'}")
                continue
            elif user_input.startswith('/budget'):
                parts = user_input.split()
                if len(parts) > 1:
                    think_budget = int(parts[1])
                print(f"Thinking budget set to: {think_budget} tokens")
                continue
            elif user_input.startswith('/temp'):
                parts = user_input.split()
                if len(parts) > 1:
                    temperature = float(parts[1])
                print(f"Temperature set to: {temperature}")
                continue

            # /clear command removed as there is no memory

            # Get response - NO HISTORY PASSED
            result = chat(
                question=user_input,
                thinking=thinking_enabled,
                think_budget=think_budget,
                temperature=temperature,
                verbose=True
            )

            # Chat history update logic removed

        except KeyboardInterrupt:
            print("\n\nGoodbye!")
            break
        except Exception as e:
            print(f"\nError: {e}")


# ============================================================================
# MAIN - RUN BENCHMARKS
# ============================================================================
if __name__ == '__main__':
    try:
        # Test questions
        test_questions = {
            # CATEGORY 1: Simple Math (Addition/Subtraction)
            "Simple Math": [
                "What is 15 + 27?",
                "What is 100 - 37?",
                "What is 45 + 55?",
                "What is 82 - 19?",
                "What is 7 + 8?",
                "What is 50 - 25?",
                "What is 123 + 456?",
                "What is 200 - 88?",
                "What is 9 + 6?",
                "What is 75 - 30?"
            ],

            # CATEGORY 2: Multiplication
            "Multiplication": [
                "What is 8 × 7?",
                "What is 12 × 12?",
                "What is 9 × 6?",
                "What is 15 × 4?",
                "What is 7 × 9?",
                "What is 11 × 11?",
                "What is 6 × 8?",
                "What is 13 × 5?",
                "What is 25 × 4?",
                "What is 20 × 3?"
            ],

            # CATEGORY 3: Division & Fractions
            "Division & Fractions": [
                "What is 56 ÷ 8?",
                "Which is larger: 1/2 or 1/3?",
                "What is 100 ÷ 4?",
                "Which is larger: 2/3 or 3/4?",
                "What is 81 ÷ 9?",
                "Which is larger: 1/4 or 1/5?",
                "What is 144 ÷ 12?",
                "What is 1/2 + 1/4?",
                "What is 50 ÷ 2?",
                "Which is larger: 3/5 or 2/5?"
            ],

            # CATEGORY 4: Word Problems
            "Word Problems": [
                "If a train travels 60 km in 1 hour, how far in 3 hours?",
                "If John has 5 apples and gives 2 to Mary, how many does he have left?",
                "If a book costs $12 and you buy 3 books, how much do you spend?",
                "If there are 24 students and 6 tables, how many students per table?",
                "If a car uses 8 liters per 100km, how much for 300km?",
                "If you earn $15 per hour and work 8 hours, how much do you earn?",
                "If a pizza has 8 slices and 4 people share equally, how many slices each?",
                "If a pen costs $2 and you have $20, how many pens can you buy?",
                "If a movie is 2 hours long and starts at 3pm, when does it end?",
                "If you save $10 per week, how much in 10 weeks?"
            ],

            # CATEGORY 5: Prime Numbers & Math Concepts
            "Math Concepts": [
                "Is 17 a prime number?",
                "Is 20 a prime number?",
                "Is 13 a prime number?",
                "What is the square root of 64?",
                "What is 5 squared (5²)?",
                "Is 1 a prime number?",
                "What is 10% of 100?",
                "Is 2 the only even prime number?",
                "What is the square root of 49?",
                "What is 3 cubed (3³)?"
            ],

            # CATEGORY 6: History
            "History": [
                "Who wrote Romeo and Juliet?",
                "Who was the first president of the United States?",
                "In what year did World War 2 end?",
                "Who discovered America in 1492?",
                "Who painted the Mona Lisa?",
                "What year did the Titanic sink?",
                "Who was the first man on the moon?",
                "In what year did World War 1 start?",
                "Who wrote the Declaration of Independence?",
                "Who was Julius Caesar?"
            ],

            # CATEGORY 7: Geography
            "Geography": [
                "What is the capital of France?",
                "Which is bigger: Russia or Canada?",
                "What is the capital of Italy?",
                "Is the Nile or Amazon river longer?",
                "Which ocean is largest: Atlantic or Pacific?",
                "What is the capital of Japan?",
                "Is Australia a continent?",
                "What is the tallest mountain in the world?",
                "How many continents are there?",
                "What is the capital of Spain?"
            ],

            # CATEGORY 8: Science & Nature
            "Science & Nature": [
                "Does the Sun orbit the Earth?",
                "What gas do plants produce during photosynthesis?",
                "Is iron magnetic?",
                "What is H2O?",
                "Does ice float in water?",
                "How many legs does a spider have?",
                "How many legs does an ant have?",
                "What is the speed of light approximately?",
                "Is the Earth flat or round?",
                "What planet is closest to the Sun?"
            ],

            # CATEGORY 9: Logical Reasoning
            "Logical Reasoning": [
                "If all cats are animals, and Fluffy is a cat, is Fluffy an animal?",
                "If today is Monday, what day is tomorrow?",
                "If you have 3 red balls and 2 blue balls, how many balls total?",
                "Complete the pattern: 2, 4, 6, 8, ?",
                "Which is the odd one out: apple, banana, car, orange?",
                "If A is taller than B, and B is taller than C, who is tallest?",
                "True or False: All birds can fly?",
                "If it's raining, the ground is wet. The ground is wet. Is it raining?",
                "Complete the pattern: Monday, Tuesday, Wednesday, ?",
                "If 5 > 3 and 3 > 1, is 5 > 1?"
            ],

            # CATEGORY 10: General Knowledge
            "General Knowledge": [
                "How many days are in a week?",
                "Which is heavier: gold or aluminum?",
                "Is the Eiffel Tower in London?",
                "Is Bitcoin a cryptocurrency?",
                "How many hours are in a day?",
                "What color is the sky on a clear day?",
                "How many months are in a year?",
                "What is the freezing point of water in Celsius?",
                "How many wheels does a bicycle have?",
                "What is the boiling point of water in Celsius?"
            ]
        }

        # Flatten all questions for quick testing
        all_questions = []
        for category, questions in test_questions.items():
            all_questions.extend(questions)

        print(f"\nTotal questions: {len(all_questions)}")
        print(f"Categories: {len(test_questions)}")
        print(f"Questions per category: {len(test_questions['Simple Math'])}")

        # Choose what to run
        print("\n" + "=" * 80)
        print("SELECT BENCHMARK MODE")
        print("=" * 80)
        print("1. Simple test (original 3 questions with default settings)")
        print("2. Deep dive single question (all configs)")
        print("3. Compare all configurations (by category) - FULL OUTPUTS")
        print("4. Interactive chat (NO MEMORY)")
        print("=" * 80)

        choice = input("\nEnter choice (1-4): ").strip()

        if choice == '1':
            # Simple test
            print("\n" + "=" * 80)
            print("SIMPLE TEST")
            print("=" * 80)

            for q in list(test_questions.values())[0][:3]:
                print(f"\nQ: {q}")
                result = chat(q, thinking=True, think_budget=128, temperature=0.7, verbose=True)

        elif choice == '2':
            # Deep dive on single question
            question = input("\nEnter question (or press Enter for '15 + 27'): ").strip()
            if not question:
                question = "What is 15 + 27?"

            benchmark_single_question(question)

        elif choice == '3':
            # Test by category
            print("\nSelect category to test:")
            categories = list(test_questions.keys())
            for i, cat in enumerate(categories, 1):
                print(f"{i}. {cat}")
            print(f"{len(categories) + 1}. ALL CATEGORIES (100 questions)")

            cat_choice = input("\nEnter choice: ").strip()

            if cat_choice == str(len(categories) + 1):
                # Test all
                compare_configurations(all_questions)
            else:
                # Test specific category
                cat_idx = int(cat_choice) - 1
                cat_name = categories[cat_idx]
                print(f"\nTesting category: {cat_name}")
                compare_configurations(test_questions[cat_name])

        elif choice == '4':
            # Interactive chat
            interactive_chat()

        else:
            print("Invalid choice. Starting interactive chat...")
            interactive_chat()

    finally:
        print(f"\n\nLogging completed. Results saved to: {log_filename}")
        logger.close()
        sys.stdout = logger.terminal

Limitations & Bias

While Noeum-1-Nano demonstrates impressive reasoning for its size, users should be aware of the following:

Hallucinations: Like all small models, it can generate plausible but incorrect information, especially when the <think> mode is disabled.
Arithmetic: While it can derive formulas correctly, it may struggle with calculating large numbers precisely.
Scope: The model is optimized for English and general reasoning. It is not intended for medical, legal, or safety-critical advice.

About Noeum

Noeum is an independent AI research & engineering lab based in Austria, building the next generation of intelligent systems. We are one of the few labs in Europe executing the full AI pipeline—from pre-training and alignment—entirely in-house.

The Vision & Future Roadmap

This project, spearheaded by Bledar Ramo, is not just a nano-model—it is a validation of a high-efficiency scaling hypothesis. We have proven that rapid iteration on small-scale "proxy" models is a reliable predictor of large-scale performance, allowing us to innovate faster than labs burdened by massive training runs.

Our Core Philosophy: Iterate fast at nano-scale; scale only what works.

With the right compute infrastructure and backing, we plan to scale these validated recipes to a 1 Trillion+ token frontier model. Our roadmap includes integrating cutting-edge techniques inspired by our internal research and recent literature:

Recursive Reasoning Architectures: Moving beyond static Chain-of-Thought to Recursive Language Models (RLMs) that treat prompts as dynamic environments, solving problems far exceeding standard context windows
Agentic Data Synthesis: Implementing large-scale, self-correcting synthetic data pipelines that simulate real-world tool use and multi-step reasoning
Stability at Scale: Utilizing advanced optimization techniques like MuonClip (QK-Norm/Clip) to ensure stability during massive training runs without loss spikes.
Hyper-Efficient Architectures: Further refining our MoE routing and Multi-head Latent Attention (MLA) to maximize active parameter efficiency.

Noeum (derived from "mind," "meaning," and "thought") is building the next generation of genuine reasoning systems—not by brute-force, but by architectural intelligence.

🌐 Website: noeum.ai 📧 Contact: contact@noeum.ai

Downloads last month: 26

noeum
/

noeum-1-nano