You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

NameNotFound-EAM

Name Not Found organization header

Evolving Architecture Model with 1-Billion-Token Context

Homepage | Contact | Hugging Face | License

NameNotFound-EAM is the first model in a new class of systems we call Evolving Architecture Models (EAMs): models that do not just run a fixed network, but adapt their own architecture at runtime. They grow new specialized capacity when they meet a gap, learn continually from the outcomes of their own generations, and reorganize how their internal experts cooperate.

It is important to say this plainly: NameNotFound-EAM is not a code model and it is not a generic general-purpose chatbot with a larger context window. It can do code, reasoning, chat, retrieval, and agentic work, but those are capabilities inside a broader adaptive architecture. The product is the evolving system: context, memory, learning, routing, specialist growth, and generation working together.

Where a conventional LLM is a frozen function from tokens to tokens, an EAM is a living system: retrieval, memory, reasoning, routing, generation, and learning are coupled into one self-improving loop. It is memory-aware by design: sessions, named identities, user facts, Savant specialists, and outcome signals can persist across turns and be recalled later. The model carries a 1-billion-token context horizon and a fully internalized, adaptive tokenizer so it can operate as a single self-contained artifact.

This is not just a new model checkpoint. It is an evolved model architecture, and it is designed to keep evolving to you. Adaptation is not limited to remembering facts in a chat history: the runtime can change expert composition, routing behavior, session-local learning state, and trainable weights so the system becomes structurally better matched to the work you give it.

Put another way: this is an evolved model, not a myth or a personality layer. The goal is a system that can keep continuity, remember what matters, grow named specialists, and become more useful through direct work with the user because adaptation is part of the architecture and runtime, not a prompt wrapper.

Website: namenotfound.ai

Available now: NameNotFound-EAM is a source-available model artifact that builders and researchers can try for themselves with no waitlist required. Full-speed serving uses the custom NameNotFound runtime and custom operation kernels currently available on Linode / Akamai Cloud.

TL;DR: 1B-token context, adaptive native tokenization, memory-aware sessions, bidirectional inference with real-time learning, multi-granularity retrieval, multi-tier persistent memory, intent-conditioned reasoning, accelerated block-parallel/speculative decoding, a multi-expert core that evolves and specializes itself at runtime, single public_slot RTX PRO 6000 Blackwell target runtime, shipped as a self-contained safetensors bundle.

Important Positioning

This is not a code model with massive context. This is a new generation of models architected from the ground up for billion-token context windows. If traditional foundation models are specialized coding race cars, this is more like the first real modular vehicle powered by a native 1 billion token context window.

The critical difference: NNF-EAM learns and evolves specifically for your use cases. On day one, it is like a toddler. As you work with it, the model grows by leveraging memory and an expanding understanding to become increasingly capable at what matters most to you.

We are focused on cost-efficient architectures that let companies build sustainable competitive moats without $10B investments to compete at the foundation-model level.

What Makes It An Evolving Architecture Model

Most "adaptive" models adapt only their activations, prompts, or external memory. An EAM adapts its structure and its weights as it operates:

  • Runtime capability growth. When the model detects a gap it cannot cover well, it composes new specialized sub-experts on the fly from a compact, language-agnostic numeric representation of the relevant knowledge and integrates them without retraining from scratch.
  • Continual, outcome-aware learning. Generations are scored and the signal is fed back into the model's routing and memory, so behavior improves with use rather than staying frozen at the training checkpoint. The model tracks the causal effect of its own decisions, not just whether outcomes were good.
  • Bidirectional inference. Inference is not a one-way read from static weights. Runtime behavior can emit training signals, update session-local memory, reinforce useful routes, and teach Savant specialists while the system is being used.
  • Accessible specialization. EAMs are designed to make custom training and specialization available to builders without requiring a foundation-lab retraining for every domain. A user or team can provide their own corpus, feedback, and held-out validation, then grow named specialists around their actual workflows.
  • Self-organizing experts. A learned router fuses multiple expert subsystems at the token level and continuously rebalances them; underperforming pathways are pruned and strong ones reinforced. Learned representations are shared across expert pathways rather than isolated per expert.
  • One coupled loop. Context retrieval, long-term memory, reasoning, intent, and decoding are not bolted-on stages. They condition each other inside a single forward/generate path.

This is why the family is named for its defining property: the architecture itself evolves.

How It Compares

  • vs. a frozen LLM: a frozen LLM stays fixed at the checkpoint; an EAM adapts in deployment, specializes to domains, and updates routing and expert behavior based on outcomes.
  • vs. a code model: a code model is optimized around code generation or repair. NameNotFound-EAM can be specialized for code, but code is one possible domain for an evolving architecture, not the identity of the model.
  • vs. a generic chatbot: a generic chatbot is usually a stateless or lightly personalized assistant. NameNotFound-EAM is designed around persistent memory, long-context grounding, real-time learning, and named specialist growth.
  • vs. a standard RAG stack: no bolted-on public retrieval recipe; context access, grounding, memory, reasoning, and generation are treated as integrated model/runtime capabilities.
  • vs. a standard MoE: a standard MoE has a fixed expert set; an EAM grows Savants, fuses, prunes, and transfers knowledge between experts at runtime.

Key Capabilities

  • 1-billion-token context horizon: addresses contexts up to one billion tokens through multi-vectored and bundled traversal on top of a high-resolution native window. Demonstrated frontier needle-in-a-haystack retrieval at the billion-token horizon with bounded, flat memory.
  • Adaptive native tokenization: tokenizer is internalized and self-contained, with no external dependency to ship or version-match. It extracts differentiable sub-token features such as numeric, character-level, and unicode-diversity features, so numbers, code, and rare strings are represented faithfully.
  • Multi-granularity hierarchical retrieval: retrieves by descending document -> paragraph -> sentence -> word -> character, narrowing at each level with late-interaction per-token matching, conditioned on context and intent represented as numbers.
  • Multi-tier memory: combines working, episodic, executive, persistent, and outcome-aware memory. The runtime can remember names, user facts, project state, feedback, and Savant specialist identities through persisted session state.
  • Bidirectional real-time learning: inference, memory, feedback, and training signals are coupled so the system can learn during use.
  • Intent-conditioned reasoning: reasoning is conditioned on inferred intent and invoked adaptively, learning when deeper reasoning is worth the cost.
  • Hybrid long-context core: combines attention with efficient linear/state-space mixing, KV-cache compression, and hardware-friendly kernels.
  • Accelerated decoding: block-parallel and speculative / multi-token parallel decoding with runtime selection among strategies.
  • Multi-expert, self-evolving core: learned token-level routing and fusion; new experts can grow Savants, combined, reinforced, or pruned at runtime, with cross-expert knowledge transfer.
  • Intent, control, and abstention: intent detection, abstention gates, and tool/public_component_9c53c074d7 dispatch.
  • Numeric, code, and structured outputs: first-class arithmetic, code generation, and structured output.
  • Self-contained safetensors deployment: tensor-native, exported as a sharded safetensors bundle.

Intended Uses

This release is for developers only and should be considered an alpha. Everything here is under active development and is subject to change.

Direct use: long-document / whole-corpus QA, summarization, public_surface retrieval over very large contexts, reasoning, code, agentic workflows, domain assistants, and applications that benefit from a model that improves with use.

Do not treat this as "just a code model" or "just a general model." The intended use is as an adaptive foundation for work that benefits from persistent context, memory, specialist growth, and model-owned learning.

Downstream use: a self-contained backbone for retrieval-augmented and agentic systems, and domain specialization via runtime adaptation rather than full retraining.

Customization use: teams can teach the system the work they actually do: codebases, operational runbooks, research corpora, customer-support patterns, analysis styles, internal terminology, and personal preferences. The intended path is not to sell everyone a static general model and ask them to prompt around its limits; it is to let the model specialize, persist, and improve around the user's own domain while keeping benchmark and validation material held out.

Out of scope / use with care: high-stakes medical, legal, financial, or safety-critical decisions without human oversight; settings requiring a frozen, audited model unless a snapshot is pinned and online adaptation is disabled; anything prohibited by license or law; and any deployment that assumes non-native runtime surfaces unlocks the full model behavior.

Runtime Note

Hugging Face Transformers and vLLM are not supported in this release.
Do not load this model with AutoModelForCausalLM, transformers.pipeline, or vLLM. This version does not ship Python runtime sources and is not a drop-in Hugging Face inference checkpoint.
Use the compiled NameNotFound runtime and CLI shipped in this repository. That is the supported public path.

Package contents

This Hugging Face repository is a complete compiled release. It includes:

Artifact Path Role
Weight shards model_bundle/model-*-of-*.safetensors + model.safetensors.index.json Sharded safetensor checkpoint (~908 numbered shards)
Compiled graph graph/model.pt2, graph/component_*.pt2, sidecars Graph-first runtime execution
Compiled CLI/runtime namenotfound_runtime/namenotfound_eam_runtime.cpython-312-x86_64-linux-gnu.so Native CLI entry and runtime loader (Linux x86_64, CPython 3.12)
Public manifests config.json, sha256.json, state_abi_*, native_codec_resolver.json, etc. Release ABI, integrity, and codec metadata

No Python source files are included. Your machine provides Python 3.12, CUDA, and PyTorch as the host interpreter only.

How to run

Download the repo, point the runtime at the download directory, and launch the compiled CLI:

huggingface-cli download namenotfoundai/NNF-EAM --local-dir ./NNF-EAM
export NNF_RELEASE="$(pwd)/NNF-EAM"
export PYTHONPATH="$NNF_RELEASE:${PYTHONPATH:-}"
export CUDA_VISIBLE_DEVICES=0
export NAMENOTFOUND_EAM_DEVICE=cuda:0

# optional convenience alias
alias namenotfound-eam='python3.12 -B -m namenotfound_runtime.cli'

namenotfound-eam --model "$NNF_RELEASE" generate "What kind of model are you?"

Without the alias, invoke the compiled CLI directly:

python3.12 -B -m namenotfound_runtime.cli --model "$NNF_RELEASE" generate "What kind of model are you?"

Full setup, chat, Savant, feedback, and resident examples are in Install, Set Up, and Run.

Runtime tiers

  • This repo (compiled CLI + graph + weights): the supported public path for download, local runs, chat, generation, feedback, Savant training, and resident mode.
  • NameNotFound Sparkle runtime + custom kernels: required for full-capability performance (accelerated decoding, long-context traversal at scale, and the complete kernel surface). Available only on Linode / Akamai Cloud, not bundled in this Hugging Face download.

NameNotFound-EAM requires the compiled runtime for the core capability surface: long-context operation, runtime Savant growth, session memory, and model-owned learning. For the highest-performance Sparkle runtime and custom kernels, deploy on Linode / Akamai Cloud.

Public harnesses that apply external chunking, retrieval, prompt compression, or context-packing before runtime ingest can still work, but they may hide native long-context retrieval and selected-context behavior. Hardware still defines what the full capability surface can hold: device placement, memory headroom, and whether Sparkle/custom kernels are available determine accelerated decoding, long context traversal, and online adaptation.

Install, Set Up, and Run

This section is the fastest path from zero to a working local run. Do not use Hugging Face Transformers, vLLM, or AutoModelForCausalLM with this repo. The supported path is the compiled NameNotFound runtime and CLI shipped in this repository.

What this repo includes

Everything needed to run the model is in this download:

Component Location Notes
Numbered safetensor shards model_bundle/ ~908 shards; allow ~200 GB free disk
Compiled graph artifacts graph/ Graph-first runtime execution
Compiled CLI and runtime namenotfound_runtime/namenotfound_eam_runtime.cpython-312-x86_64-linux-gnu.so Linux x86_64, CPython 3.12
Public manifests config.json, sha256.json, state_abi_*, native_codec_resolver.json, etc. Release ABI and load metadata

This repo does not ship Python source files. The CLI and runtime are compiled native extensions plus graph and weight artifacts. Your machine provides the host interpreter (Python 3.12), CUDA, and PyTorch.

Prerequisites

  • OS: Linux x86_64 (required for the shipped .so)
  • Python: 3.12 (matches the compiled extension ABI)
  • GPU: CUDA-capable NVIDIA GPU with enough VRAM for your workload
  • Disk: ~200 GB free for a full download of this repo
  • Tools: Hugging Face CLI (pip install -U "huggingface_hub[cli]")

Full-capability Sparkle runtime performance and custom kernels are available only on Linode / Akamai Cloud. Local CUDA GPUs can run the compiled CLI from this download; for the full accelerated kernel surface, use Linode.

Step 1 โ€” Download the release

python3 -m pip install -U "huggingface_hub[cli]"
huggingface-cli login   # if needed

mkdir -p ~/models && cd ~/models
huggingface-cli download namenotfoundai/NNF-EAM --local-dir ./NNF-EAM
export NNF_RELEASE="$(pwd)/NNF-EAM"

Step 2 โ€” Configure the runtime environment

Point the compiled runtime at your download directory and bind a GPU:

export PYTHONPATH="$NNF_RELEASE:${PYTHONPATH:-}"
export CUDA_VISIBLE_DEVICES="${NNF_CUDA_VISIBLE_DEVICES:-0}"
export NAMENOTFOUND_EAM_DEVICE="${NAMENOTFOUND_EAM_DEVICE:-cuda:0}"
export PYTORCH_CUDA_ALLOC_CONF="${PYTORCH_CUDA_ALLOC_CONF:-expandable_segments:True}"
export PYTHONNOUSERSITE="${PYTHONNOUSERSITE:-1}"
export PYTHONDONTWRITEBYTECODE="${PYTHONDONTWRITEBYTECODE:-1}"

Optional shell alias so the rest of this README reads naturally:

alias namenotfound-eam='python3.12 -B -m namenotfound_runtime.cli'

Every command below uses this pattern:

namenotfound-eam --model "$NNF_RELEASE" <command> [options]

If you skip the alias, invoke the compiled CLI directly:

python3.12 -B -m namenotfound_runtime.cli --model "$NNF_RELEASE" <command> [options]

The CLI loads the compiled extension from $NNF_RELEASE/namenotfound_runtime/ and the graph and weight artifacts from the same directory.

Step 3 โ€” First generation

One-shot prompt:

namenotfound-eam --model "$NNF_RELEASE" generate "What kind of model are you?"

Machine-readable output:

namenotfound-eam --model "$NNF_RELEASE" generate "What kind of model are you?" --json | jq -r '.text'

With external context:

namenotfound-eam --model "$NNF_RELEASE" generate "Answer from the provided context." \
  --context "$(cat /path/to/context.txt)" \
  --max-new-tokens 256

Step 4 โ€” Interactive chat

namenotfound-eam --model "$NNF_RELEASE" chat

Type /exit or /quit to stop. Persist session state when you want continuity across restarts:

export SESSION=./nnf-session.json
namenotfound-eam --model "$NNF_RELEASE" chat --save-session "$SESSION"

Step 5 โ€” Keep a warm runtime (optional)

For repeated calls without reloading the full graph each time, use resident mode:

namenotfound-eam --model "$NNF_RELEASE" resident start
# ... run generate/chat while the resident is up ...
namenotfound-eam --model "$NNF_RELEASE" resident stop

See Native CLI below for feedback, sevant, JSONL chat, and resident snapshot/rollback.

Troubleshooting

Symptom Likely fix
No module named namenotfound_runtime.cli Set PYTHONPATH="$NNF_RELEASE" and use Python 3.12
Missing .so or graph errors Confirm NNF_RELEASE points at the full Hugging Face download directory
Import or ABI errors Use Python 3.12 on Linux x86_64 to match the shipped extension
CUDA OOM or slow first load Bind a dedicated GPU with CUDA_VISIBLE_DEVICES; stop other GPU jobs (training, resident processes, notebooks, or second model instances) on that device; use resident start for repeated calls instead of reloading the graph each time
Tried transformers / vLLM / AutoModel Not supported โ€” use the compiled CLI only

Next steps

  • Savant specialists: see Savant geometry: layers, heads, and dim and the sevant examples below
  • Feedback and learning: see Feedback
  • Full command surface: see Native CLI

Runtime Access

This release provides one supported public runtime surface:

  • Compiled CLI + runtime: follow Install, Set Up, and Run above, then use generate, chat, feedback, sevant, or resident.

The examples below assume NNF_RELEASE is set and use placeholder paths with sanitized prompts.

Native CLI

The compiled CLI exposes five public commands: generate, chat, feedback, sevant, and resident. They transport prompts, call model-owned runtime surfaces, persist exported session state when requested, and return responses. They do not rely on CLI-authored answer rules or benchmark-specific answer injection.

Generate (one-shot)

Run a one-shot prompt and read the answer:

namenotfound-eam --model "$NNF_RELEASE" generate "What kind of model are you?"

With --json, output is a single JSON object containing the model's text field.

Ground the answer in your own context with --context:

namenotfound-eam --model "$NNF_RELEASE" generate "Answer from the provided context." \
  --context "$(cat /path/to/context.txt)"

Control the response length with --max-new-tokens (default 192):

namenotfound-eam --model "$NNF_RELEASE" generate "Summarize the key points." \
  --context "$(cat /path/to/context.txt)" \
  --max-new-tokens 512

Non-interactive callers can pipe JSON output into a parser:

namenotfound-eam --model "$NNF_RELEASE" generate "What kind of model are you?" --json | jq -r '.text'

Interactive chat

Start an interactive chat session (type /exit or /quit to stop):

namenotfound-eam --model "$NNF_RELEASE" chat

Native chat-session persistence:

export SESSION=./nnf-session.json

namenotfound-eam --model "$NNF_RELEASE" chat \
  --save-session "$SESSION"

For non-interactive callers, JSONL mode can carry context and follow-up turns:

printf '%s\n' \
  '{"context":"The project context goes here.","followup":"Answer from the provided context."}' \
  | namenotfound-eam --model "$NNF_RELEASE" chat \
      --stdin-jsonl \
      --jsonl \
      --save-session "$SESSION"

Start chat with an initial context file:

namenotfound-eam --model "$NNF_RELEASE" chat --context-file /path/to/context.txt

Deleting or resetting model-owned Savant or session state clears that state, but it does not necessarily restore an earlier evolved state. If online adaptation, Savant specialists, or saved deltas have changed model-owned state, you may need to restore a previous snapshot or checkpoint if you store them and need the exact earlier behavior.

Feedback

Teach the model from a corrected prompt/response turn with namenotfound-eam feedback. The expected intent and expected answer are learned as feedback through the model-owned continual-learning surface:

namenotfound-eam --model "$NNF_RELEASE" feedback \
  --prompt "What is the capital of France?" \
  --response "It is Berlin." \
  --feedback "The correct answer is Paris." \
  --expected-answer "Paris" \
  --rating down

Like generate, feedback accepts --prompt-file for long prompts, --save-session to persist the learned signal, and --json for machine-readable output.

Compiled Runtime

The capability surface runs through the compiled NameNotFound runtime loaded by the CLI: long-context operation, runtime Savant growth, session memory, and model-owned learning. There is no supported Transformers or vLLM wrapper for this release.

Sparkle runtime performance and custom kernels are available only on Linode / Akamai Cloud (public_slot RTX PRO 6000 Blackwell target). This Hugging Face download ships the compiled CLI, graph, and weights for local runs; contact Linode / NameNotFound for the full Sparkle kernel surface.

namenotfound-eam --model "$NNF_RELEASE" generate \
  --prompt "Summarize the attached report and cite the relevant sections." \
  --context-file /path/to/very_long_document.txt \
  --max-new-tokens 512

Reproducibility And Adaptation Controls

NameNotFound-EAM can evolve through session state, online adaptation, and Savant specialists. If your application requires fixed behavior, treat reproducibility as an explicit deployment choice: pin a snapshot when you create one and disable online adaptation in the harness if you use it.

Use CLI session controls such as --save-session, --start-fresh, and resident snapshot/rollback flows when you need fixed or restorable behavior. When adaptation is enabled, session-local behavior can improve over time through model-owned memory/session state. Persist and restore that state deliberately, and keep base release snapshots pinned for auditability.

Public Runtime Path

The public runtimes expose this high-level path:

  1. public raw prompt or streamed context
  2. native tokenizer
  3. streaming input ingest
  4. selected context index
  5. selected context materialization
  6. answer surface
  7. optional Savant / continual-learning session state

Memory, Bidirectional Inference, And Persistent Savants

What A Billion-Token Context And Evolving Architecture Mean For You

A billion-token context is not just a larger prompt box. It means your model can work across whole repositories, long histories, research archives, customer records, operational logs, and evolving project memory without forcing every task into a short, disposable chat.

An Evolving Architecture Model makes that context trainable. You can bring your own data, feedback, terminology, policies, style guides, codebase, research corpus, or workflow history and grow specialists around it. Instead of waiting for a foundation model vendor to retraining a general-purpose model for your niche, you can create named experts for the things you actually do: a project code specialist, a materials-research assistant, a support-policy expert, a legal-review companion, a manufacturing process guide, or any other domain specialist your deployment needs.

Those specialists can be named, recalled, improved, and redirected over time. The goal is custom training for builders, teams, and individuals: not a one-off fine-tune that gets stale, but a memory-aware system that can specialize through use, keep the useful parts, and keep benchmark or validation material separate from the data used to teach it.

On day one, it may feel like you are using a powerful model. After two weeks of real work, it should feel like you are using your model: one that has learned your corpus, your goals, your terminology, your recurring mistakes, and the specialists you rely on. Two deployments that start from the same base snapshot can diverge as they accumulate different memories, feedback, Savant specialists, routing changes, and adaptation state. When you want specialization, let the architecture evolve with the work.

NameNotFound-EAM is memory-aware. The runtime can carry forward user-provided context, user facts, named assistant identity, task history, feedback, and Savant-specialist state through persisted sessions. This allows the model to be named, remember that name in later turns, learn stable user or project facts, and recall them after the session is restored.

The system also supports bidirectional inference: inference produces answers, but it can also produce learning signals. Feedback, outcome scoring, selected-context evidence, and session traces can flow back into model-owned memory and routing state while the system is running. In the full runtime, those signals can be used for real-time learning, online adaptation, and Savant-specialist improvement.

Persistent memory is layered:

  • Working memory: current turn and active context.
  • Episodic memory: recent interactions, feedback, and task traces.
  • Executive memory: higher-level task or project state used to guide future behavior.
  • Persistent memory: named facts, identities, preferences, and reusable knowledge stored across sessions.
  • Outcome-aware memory: records which strategies, routes, and generated specialists worked.

Plan and name a new Savant:

namenotfound-eam --model "$NNF_RELEASE" sevant \
  --dry-run \
  --domain code \
  --target-sevant-id project-code-sevant \
  --sevant-alias project-code-sevant \
  --goal "Specialize on this project's coding patterns."

When the plan is correct, rerun the same command with --run-training. Use --dry-run first for every new Savant or retrain so you can review the action, corpus counts, feedback counts, and selected training command before any trainer starts.

Savant geometry: layers, heads, and dim

Public documentation uses Savant for named runtime specialists. The compiled CLI subcommand and flags still use sevant (for example namenotfound-eam sevant and --sevant-layer-count).

When building a Savant, you control its native capability geometry with three public flags. The native stack uses a fixed hidden width of 512; you scale depth (layers), attention width (heads), and feed-forward width (multiplier) from there.

Benefits and costs (read this before tuning)

Each knob trades specialist quality and expressiveness against GPU memory, training time, and inference latency. None of the three flags change the fixed hidden width (512); they change how much computation happens inside the Savant stack.

--sevant-layer-count (depth)

Benefit Deeper internal composition โ€” the specialist can chain more native transforms before answering. Better for multi-step domain logic, layered policies, codebases with deep conventions, or workflows where one pass is not enough.
Cost Highest latency impact per unit increase โ€” every layer runs in sequence. More VRAM during training and inference; longer --run-training time; larger persisted Savant artifacts.
Turn it up when The specialist understands pieces of the domain but fails on chained reasoning, multi-hop internal structure, or โ€œconnect these rules togetherโ€ tasks.
Keep it lower when You need fast iteration, tight GPU budget, or a narrow single-step task (formatting, classification-style behavior, simple lookup patterns).
Special value 0 = pass-through only โ€” use for geometry wiring checks, not production specialists.

--sevant-head-count (attention width per layer)

Benefit Richer internal attention โ€” more parallel pathways for the specialist to route context inside each layer. Better when the domain needs many simultaneous constraints (API + style + safety + naming), mixed document types, or fine distinction between similar patterns.
Cost Moderate memory and training cost vs layers or FF width. Some extra latency per layer, but usually less than adding a full layer. Head count does not need to divide 512 โ€” the stack projects to the requested head count and back.
Turn it up when The specialist misses nuance, blends constraints incorrectly, or underuses parts of the corpus/feedback you gave it โ€” but already has enough depth.
Keep it lower when The task is narrow, inputs are homogeneous, or you are GPU-constrained and already at higher layer count or FF multiplier.

--sevant-ff-multiplier (feed-forward width)

Benefit Wider internal MLP blocks โ€” more room to store and transform domain-specific patterns per layer. Feed-forward dim = 512 ร— multiplier (minimum 512). Best for vocabulary-heavy domains, large pattern banks, dense terminology, code with many idioms, or specialists that must retain many distinct behaviors.
Cost Often the largest memory impact per step โ€” FF blocks dominate parameter volume inside each layer. Higher multiplier = more VRAM, longer training, heavier persisted weights.
Turn it up when The specialist needs to โ€œhold moreโ€ distinct patterns at once (large corpus, many conventions, wide output styles) but depth and attention routing are already adequate.
Keep it lower when Corpus is small, the domain is narrow, or you are hitting OOM during Savant training โ€” try lowering multiplier before lowering layers if depth is still needed.

Hidden dim (fixed at 512)

Benefit Stable native stack width shared across all public Savant geometry โ€” predictable baseline for memory and artifact size.
Cost Not tunable via CLI. To increase total capacity, raise layers, heads, and/or ff-multiplier instead.

Quick tuning guide by need

Your goal Raise first Raise second Raise last Avoid
Fast experiments / tight GPU stay at default 2/4/2 โ€” โ€” high layer count + high FF multiplier together
Deeper reasoning inside the specialist --sevant-layer-count --sevant-head-count --sevant-ff-multiplier --sevant-layer-count 0 for real work
Many simultaneous constraints in context --sevant-head-count --sevant-layer-count --sevant-ff-multiplier jumping straight to max all three
Large corpus / dense terminology --sevant-ff-multiplier --sevant-layer-count --sevant-head-count FF 4 on small GPU without --dry-run
Production specialist (balanced) default 2/4/2, then one knob at a time inspect --dry-run plan after each change โ€” changing all three before reviewing dry-run

Default public geometry

2 layers ยท 4 heads ยท 512 hidden dim ยท 1024 feed-forward dim per layer:

Setting Default Effective size
--sevant-layer-count 2 2 specialist layers
--sevant-head-count 4 4 attention heads per layer
Hidden dim (fixed) 512
--sevant-ff-multiplier 2 feed-forward dim = 512 ร— 2 = 1024 per layer
  --sevant-layer-count 2 \
  --sevant-head-count 4 \
  --sevant-ff-multiplier 2

How to scale up (examples)

Increase one knob at a time, run --dry-run, inspect the plan, then train. After each change, compare dry-run output and GPU headroom before stacking another increase.

More depth โ€” benefit: chained internal reasoning ยท cost: highest per-layer latency and training time:

namenotfound-eam --model "$NNF_RELEASE" sevant \
  --dry-run \
  --domain code \
  --target-sevant-id project-code-sevant \
  --sevant-layer-count 4 \
  --sevant-head-count 4 \
  --sevant-ff-multiplier 2 \
  --goal "Deeper coding specialist."

Effective geometry: 4 layers ยท 4 heads ยท 512 hidden ยท 1024 FF dim per layer.

More attention โ€” benefit: finer constraint routing ยท cost: moderate VRAM and training time:

namenotfound-eam --model "$NNF_RELEASE" sevant \
  --dry-run \
  --domain code \
  --target-sevant-id project-code-sevant \
  --sevant-layer-count 2 \
  --sevant-head-count 8 \
  --sevant-ff-multiplier 2 \
  --goal "Wider-attention coding specialist."

Effective geometry: 2 layers ยท 8 heads ยท 512 hidden ยท 1024 FF dim per layer.

Wider feed-forward blocks โ€” benefit: more distinct patterns per layer ยท cost: often the largest memory jump:

namenotfound-eam --model "$NNF_RELEASE" sevant \
  --dry-run \
  --domain code \
  --target-sevant-id project-code-sevant \
  --sevant-layer-count 2 \
  --sevant-head-count 4 \
  --sevant-ff-multiplier 4 \
  --goal "Wider-feedforward coding specialist."

Effective geometry: 2 layers ยท 4 heads ยท 512 hidden ยท 2048 FF dim per layer (512 ร— 4).

Higher-capacity specialist โ€” benefit: maximum public expressiveness ยท cost: highest VRAM, training time, and artifact size (use when narrower tuning was not enough):

namenotfound-eam --model "$NNF_RELEASE" sevant \
  --dry-run \
  --domain code \
  --target-sevant-id project-code-sevant \
  --sevant-layer-count 4 \
  --sevant-head-count 8 \
  --sevant-ff-multiplier 4 \
  --goal "Build a higher-capacity coding Savant."

Effective geometry: 4 layers ยท 8 heads ยท 512 hidden ยท 2048 FF dim per layer.

Low-capacity check (minimum geometry)

Use this only to confirm the Savant path and geometry wiring before scaling up:

namenotfound-eam --model "$NNF_RELEASE" sevant \
  --dry-run \
  --domain code \
  --target-sevant-id project-code-sevant \
  --sevant-layer-count 0 \
  --sevant-head-count 1 \
  --sevant-ff-multiplier 1 \
  --goal "Check Savant geometry."

Scaling guidance

  1. Start at the default (2 / 4 / 2) โ€” best balance of benefit vs cost for a first real Savant.
  2. If chained reasoning fails, raise --sevant-layer-count first (accept higher latency and training time).
  3. If nuance / constraint blending fails, raise --sevant-head-count next (moderate cost).
  4. If the domain is pattern-dense but structurally simple, raise --sevant-ff-multiplier (watch VRAM โ€” often the first OOM source).
  5. If you hit OOM, lower FF multiplier before removing layers; bind a dedicated GPU and stop other jobs on that device.
  6. Always --dry-run before --run-training after any geometry change โ€” review corpus counts, feedback counts, and the selected training command before paying training cost.

Add an initial corpus:

namenotfound-eam --model "$NNF_RELEASE" sevant \
  --dry-run \
  --domain code \
  --target-sevant-id project-code-sevant \
  --sevant-alias project-code-sevant \
  --goal "Learn this project's coding patterns from the provided corpus." \
  --user-corpus-jsonl ./sevant-corpus.jsonl

User corpus JSONL rows should contain at least:

{"domain":"code","target_sevant_id":"project-code-sevant","prompt":"Describe the desired task behavior.","target":"Describe the expected model-owned answer or repair behavior."}

Tune a Savant manually from feedback:

namenotfound-eam --model "$NNF_RELEASE" sevant \
  --dry-run \
  --domain code \
  --target-sevant-id project-code-sevant \
  --goal "Improve this specialist from user feedback." \
  --user-feedback-jsonl ./sevant-feedback.jsonl

Feedback JSONL rows should contain at least:

{"domain":"code","target_sevant_id":"project-code-sevant","prompt":"The prior answer missed an important constraint.","response":"The previous response text.","correction":"The corrected behavior or answer.","feedback_source":"user","issue_kind":"correction","repair_action":"retrain"}

Use benchmarks only as validation:

namenotfound-eam --model "$NNF_RELEASE" sevant \
  --dry-run \
  --domain code \
  --target-sevant-id project-code-sevant \
  --goal "Measure this specialist against held-out tasks." \
  --user-corpus-jsonl ./sevant-corpus.jsonl \
  --benchmark-jsonl ./sevant-holdout.jsonl

Benchmark JSONL rows may include prompts, tests, and expected answers, but they are recorded as validation-only. The Savant training command should not use benchmark or validation files as training data.

Retrain or pivot an existing Savant:

namenotfound-eam --model "$NNF_RELEASE" sevant \
  --dry-run \
  --force-retrain \
  --domain code \
  --target-sevant-id project-code-sevant \
  --goal "Pivot this Savant toward a new project behavior." \
  --user-corpus-jsonl ./sevant-corpus-v2.jsonl \
  --user-feedback-jsonl ./sevant-feedback-v2.jsonl \
  --benchmark-jsonl ./sevant-holdout-v2.jsonl

Delete a Savant by id or alias:

namenotfound-eam --model "$NNF_RELEASE" sevant \
  --remove-sevant \
  --target-sevant-id project-code-sevant

Performance And Harness Variability

Performance and observed scores vary across runtime and harness settings. The native CLI and full NameNotFound runtime do not exercise identical prompt transport, batching behavior, or context materialization paths.

Expected variation sources include:

  • GPU model, driver, CUDA version, and kernel availability
  • CUDA_VISIBLE_DEVICES, tensor parallel size, batch size, and memory utilization
  • Runtime command and deployment settings
  • CLI vs. native runtime invocation and session flow
  • benchmark scaffold, agent harness, timeout, retry policy, and scoring script
  • context length, context order, session state, and whether online adaptation is enabled

benchmark numbers in this model card are internal unless explicitly marked as an external comparison value. Treat comparisons across harnesses as directional unless the same dataset, scaffold, runtime surface, prompt format, and scoring script are used.

benchmarks Are Broken, But Still Useful

benchmarks are broken, and you can break them too. With Savant specialists, users can teach the model a domain, task family, codebase, or workflow until yesterday's hard benchmark pattern becomes routine work. The responsible path is to training on your own corpus and feedback, then keep benchmark and validation files held out so the result measures real specialization rather than memorized answers. So do it yourself: specialize the model on your own domain, run your own held-out benchmark, and share what you measure in the comments or on social โ€” earned, reproducible results from real users say more than any score we could publish.

Static benchmark scores are a narrow snapshot of a system that is increasingly interactive, memory-aware, tool-using, and adaptive. They often collapse the most important parts of an EAM into a single number: whether it can ingest new context, remember user state, learn from feedback, preserve named specialists, recover exact evidence, and keep improving without being retraininged from scratch.

The difference is that benchmarks are treated as probes, not as the whole product. What the field actually needs is not another static benchmark of the current state of the art, but new benchmarks that people can create and use to training capabilities and tasks, not general or specific knowledge recall unless that knowledge is the explicit target.

Release Artifact Format

Full model reloads use numbered safetensor shards with an index map and integrity metadata:

model_bundle/model-00001-of-00906.safetensors
model_bundle/model-00002-of-00906.safetensors
...
model_bundle/model.safetensors.index.json
graph/model.pt2
graph/component_*.pt2
namenotfound_runtime/namenotfound_eam_runtime.cpython-312-x86_64-linux-gnu.so
sha256.json
state_abi_manifest.json

The exact shard count and manifest names can change between reloads. Consumers should rely on the shipped index/manifest files rather than hard-coded shard counts.

Limitations, Risks, And Recommendations

  • Research-grade, evolving system: behavior can drift between snapshots; use the reproducibility controls above when fixed behavior is required.
  • Hallucination: abstention gates reduce but do not eliminate hallucination; keep a human in the loop.
  • Bias: reflects its training corpora; evaluate for your use case.
  • Compute: the full model is designed around the public_slot RTX PRO 6000 Blackwell target with custom runtime kernels; long horizon trades latency for reach.
  • benchmarks: benchmark scores are useful held-out probes, not the whole measure of an adaptive architecture. Current reporting focuses on long-context retrieval; independent validation and further benchmarks are forthcoming.

Evaluation

Reported long-context benchmark results from internal evaluation on the shipped runtime path. Further benchmarks are forthcoming.

benchmark Coverage

The NameNotFound-EAM column reports internal evaluations on the shipped runtime path. External comparison values are included only where a public source reports the exact model/eval pair or a comparable public leaderboard value. n/r means no directly comparable public value was reported in the referenced sources.

benchmark NameNotFound-EAM SubQ public_slot 3.1 Pro Opus 4.6 Opus 4.7 GPT-5.4 GPT-5.5
RULER @ 128K 100.0% 95.6% n/r n/r n/r n/r n/r
MRCR v2 (8-needle, 1M) 100.0% 86.2% 26.3% 78.3% 32.2% 36.6% 74.0%
MRCR-style (500-needle, 1B context) 72.6% n/r n/r n/r n/r n/r n/r

NameNotFound-EAM results are internally evaluated; further public benchmarks are forthcoming.

Technical Specifications

Property Value
Architecture class Evolving Architecture Model (EAM)
Primary positioning adaptive evolving architecture, not a code-only model or generic chatbot
Context horizon up to 1,000,000,000 tokens (hierarchical) + high-resolution native window
training accelerator Single public_slot RTX PRO 6000 Blackwell
Full-model runtime target Single public_slot RTX PRO 6000 Blackwell
Runtime requirement Compiled CLI + graph + weights from this repo; Sparkle runtime + custom kernels on Linode only
Tokenizer internalized, adaptive, self-contained
Core hybrid attention + efficient sequence mixing; multi-expert with learned token-level fusion
Memory working, episodic, executive, persistent, and outcome-aware
Adaptation runtime Savant growth, fusion, pruning + continual, outcome-aware learning
Decoding block-parallel / speculative / multi-token parallel decoding
Format tensor-native, sharded safetensors bundle + index + integrity manifest
Precision GPU-resident; mixed precision

Proprietary Internals

The following are intentionally not disclosed in this public model card:

  • Context traversal and grounding method
  • Retrieval / indexing / matching internals
  • Memory addressing internals
  • Expert composition details
  • Routing dimensions and controller design
  • training recipe
  • Kernel implementation details
  • Runtime scheduling internals

Credits

  • Wendell Adams -- Engineer, Designer, and Architect
  • Ross Gates -- Strategy
  • NameNotFound -- namenotfound.ai

Citation

@misc{namenotfound_eam_2026,
  title        = {NameNotFound-EAM: An Evolving Architecture Model with a 1-Billion-Token Context Horizon},
  author       = {Wendell Adams and Ross Gates},
  year         = {2026},
  organization = {NameNotFound},
  note         = {Engineering, design, and architecture: Wendell Adams. Strategy: Ross Gates. Full capability runtime currently available on Linode / Akamai Cloud.}
}

License And Contact

Released under the NNF Source-Available Model License. See LICENSE.md.

Running a version with full capabilities on your own hardware requires building a kernel or adaptive layer.

Run a copy with NameNotFound's custom kernel on Linode / Akamai Cloud by contacting:

For full-runtime deployment with NameNotFound's custom kernel on Linode / Akamai Cloud, contact the NameNotFound team for current onboarding details.

For commercial use, partnerships, or other questions, contact the NameNotFound team: ai@namenotfound.ai

Future Development

This project is maintained by a dedicated two-person team at NameNotFound. We view development as a collaborative effort with the community. Your feedback directly shapes how the system evolves. We prioritize updates carefully and respond as our current development cycle allows; please give feedback directly and we will work to incorporate it as we update and release our other forthcoming models that have already been designed.

Known Limitations And Active Work

We are actively expanding capabilities and addressing known issues across the public runtime surfaces. In particular, we are aware of some decode-quality issues, and response behavior can vary by surface while this work continues:

  • Native CLI: we are improving decode reliability and behavior quality over time.
  • Sparkle runtime + custom kernels: full-performance accelerated decoding and the complete custom kernel surface are available only on Linode / Akamai Cloud, not in this Hugging Face download.

These are known, actively tracked items rather than final behavior. Expect decode quality and capability coverage to improve as updates land (see Updates And Versioning below).

Updates And Versioning

Note: This repository is updated regularly. Files, documentation, runtime surfaces, reported results, and the Savant/runtime interface may change between updates. If you need stable, reproducible behavior, pin a specific commit or snapshot rather than tracking the latest revision.

Performance note: the NameNotFound Sparkle runtime and custom kernels that unlock full-capability performance are deployed on Linode / Akamai Cloud only. This repo ships the compiled CLI, graph, and weights for download and local use; for the full accelerated kernel surface, run on Linode or contact ai@namenotfound.ai.

Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support