CompactAI

community

Activity Feed

AI & ML interests

Building small models for everyone

Recent Activity

CompactAI updated a Space about 5 hours ago

CompactAI-O/CompactAIModelRunner

CompactAI published a Space about 5 hours ago

CompactAI-O/CompactAIModelRunner

CompactAI updated a collection about 5 hours ago

Shard series

View all activity

CompactAI

updated a Space about 5 hours ago

CompactAIModelRunner

🔥

Run all of CompactAI's models in a gradio web interface!

CompactAI

published a Space about 5 hours ago

CompactAIModelRunner

🔥

Run all of CompactAI's models in a gradio web interface!

CompactAI

updated a collection about 5 hours ago

Shard series

Collection

Shard series of models (50M) • 1 item • Updated about 5 hours ago

CompactAI

updated a collection about 7 hours ago

Glint Series

Collection

Find all of the Glint models in one place! (Hint: its here ) • 5 items • Updated about 7 hours ago • 2

Crownelius

updated a model about 10 hours ago

CompactAI-O/Shard-1

Updated about 10 hours ago • 1

Crownelius

published a model about 11 hours ago

CompactAI-O/Shard-1

Updated about 10 hours ago • 1

Crownelius

posted an update about 11 hours ago

Post

363

Day 3 - 05/02/2026
Scamp ships, hits the wall. New plan...

Scamp came back from training today... Didn't go so well, I'm still unsure...

Fast benchmark, temperature 0.7, top_p 0.9:
- "Capital of France is" produced "covered by the Crown" (grammatical, factually wrong)
- "23 + 19 = ?" produced "23. Answer: 23. Answer: 23..." (loops, math broken)
- "def fibonacci(n):" produced a list of letters

It speaks English. It can't reason. At 8K vocab and 50M params, it was never going to.

Next build: 412M MoE-3E. Three experts (math, language, code), top-1 routing, random init, let specialization emerge from gradient signal alone. Tried seeded Branch-Train-MiX first then dropped it. Adds compute for no clear win when the router will find its own attractors anyway.

Big lesson today came from limit testing on A100 80GB. Surprise, every planned phase ran out of memory even on 80GB. Root cause: at vocab 262144 (Gemma 3 standard), the output logits dominate during forward and backward. Fix: Liger Kernel's fused cross-entropy. It streams the loss computation instead of materialising the full B by T by vocab tensor. Without it the build would not run.

Scamp proved the pipeline runs end-to-end on real hardware. The 412M run starts tomorrow. If routing balances naturally and math finally crystallises, ships as Crowfeather-412M-3E with GGUF in F16, Q8, Q5, and Q4.

So... the training may have produced a poet if I had done it better. But I didn't, so instead... we get a malformed robot named Scamp... This is progress.

-Shane

P.S Join discord for discussion: https://discord.gg/8ZscHNmJYE and
I post my finished stuff here:

CompactAI-O

1 reply

CompactAI

updated a Space about 13 hours ago

README

🏆

CompactAI

published a model about 14 hours ago

CompactAI-O/Glint-1

Updated about 14 hours ago • 6

CompactAI

updated a model about 14 hours ago

CompactAI-O/Glint-1

Updated about 14 hours ago • 6

CompactAI

updated a Space about 15 hours ago

Built With Curiosity Not Compute

🐠

Generate AI text responses with TinyMemoryLM

Crownelius

posted an update 1 day ago

Post

2739

[DAY TWO] PROJECT CROWFEATHER - 5/1/2026
Que sera, what will he be?

Step 47,500 of 100,000. Loss hovering around 2.76 on 6.2B tokens. Throughput steady at 87k per second on the A100. Not a GH200, but she gets it done.

Still haven't named him. Scamp has a rascally charm. Quentin sounds like he'd wear a bow tie and think hard before speaking. Taking votes.

Phase two is what's keeping me up. Datasets everywhere and I can't pick. I'm fusing Google and DeepSeek's ideas: Gemma 4's alternating sliding and global attention, DeepSeek V4's Muon optimizer and WSD scheduler, Gemma 2's logit soft cap, and PaLM's z-loss. Sounds like peanut butter on a hamburger, but the loss curve says it works.

Tribe_v2 has real potential but needs more scaffolding than a barn raising before I throw it in. One thing's certain though. This model's gonna be a thinker. Not a Wikipedia parrot. Something that chews before it answers.

Finally got a use for my less popular datasets too. Some Opus-4.5-Writing-Style for polish. A few rows of Human-Archtypes-25k to see what personality bubbles up. Could be a poet, could be a grump. Either beats a flimsy fine-tune.

The bank's after my credit card. Until then, full steam.

Next model gets graphs. I swear.

-Shane

3 replies

Crownelius

posted an update 3 days ago

Post

3753

[DAY ONE] PROJECT CROWFEATHER 4/30/2026
...The day I forgot to attach wandb.ai
Just dropped Crowfeather-50m, the first checkpoint in a series, and yeah, no graphs.

Crowfeather/Crowfeather-50m

54.5M params. Pretrain only. 17,500 steps banked on FineWeb-edu before Thunder credits ran dry. About 2.3B tokens, no SFT yet.

Architecture: Gemma-4 alternating sliding/global attention (1024 window, last layer always global) plus DeepSeek-V4 Muon optimizer plus WSD scheduler plus Gemma-2 logit soft-cap plus PaLM z-loss. Recipe in the model card.

What it can do: writes grammatical English. Knows that France has Rhine-adjacent monasteries (it picked Rouen instead of Paris but the vocabulary is in there). Tells stories about Mr. Fabien.

What it can't do yet: facts, code, math. Base LM, no SFT, no instruction tuning.

The series:
Every additional training run becomes another model card here
Every model card gets a matching post on this profile
Continuation goes to Colab next, picking up from step 17500 out of 100k

Limited to one post a day on Hugging Face, so updates will trickle out at that pace. Follow [@Crownelius](@Crownelius ) and [@Crowfeather](

Crowfeather ) if you want to watch this thing learn in public. Next drop will either come with the finished pre-train or whatever step I land on before the bank takes my credit card away.

Graphs will be available on my NEXT model lol

-Shane

3 replies

Crownelius

posted an update 4 days ago

Post

5833

My Huggingface journey has been a trip!
I wanted to take the time to thank each and every one of you for using my dataset and getting it to go as far as it did. Believe it or not, some neanderthal was and maybe still is trending on huggingface.

Not only did my dataset reach number one, my fine-tuned qwen3.5 model did as well. Top 10. Honestly, ain't much left to do here.

Y'all have given me the desire, no... the craving for more. I am absolutely obsessed with AI now. I want to tweak it... I want to take it apart, just to see what makes everything tick. I want to put it together like Frankenstein and his monster.

The only thing that's stopping this guy is compute. I don't mind spending every penny I have on this. I desperately want to drive AI forward, even just a little bit.

I never knew the clanker hater from a year ago would be saying this.

Thank you all from the bottom of my heart.

Looking forward to showing you what I'm cooking up next. @CompactAI is your only hint!