Intelligent Estate

community

https://www.youtube.com/@intelligentestate

Radiant_Castle

Activity Feed Request to join this org

AI & ML interests

Connecting individuals with innovation: Emancipating and Truly Federalizing Private Intelligence

Recent Activity

suayptalha authored a paper about 1 month ago

Superpositional Gradient Descent: Harnessing Quantum Principles for Model Training

fuzzy-mittenz updated a collection about 1 month ago

SwarmModels

fuzzy-mittenz updated a collection 2 months ago

SotA-GGUF

View all activity

takarajordan

posted an update about 18 hours ago

Post

110

yooo Tongyi-MAI/Z-Image-Turbo IS SOOOO SICK!

Congrats to the team you absolutely cooked with this.

csabakecskemeti

posted an update 1 day ago

Post

173

FYI: Mistral.Ministral-3 dequantizer FP8->BF16

https://github.com/csabakecskemeti/ministral-3_dequantizer_fp8-bf16

(The instruct model weights are in FP8)

csabakecskemeti

posted an update 2 days ago

Post

1993

Looking for some help to test an INT8 Deepseek 3.2:
SGLang supports Channel wise INT8 quants on CPUs with AMX instructions (Xeon 5 and above AFAIK)
https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/

Currently uploading an INT8 version of Deepseek 3.2 Speciale:
DevQuasar/deepseek-ai.DeepSeek-V3.2-Speciale-Channel-INT8

I cannot test this I'm on AMD
"AssertionError: W8A8Int8LinearMethod on CPU requires that CPU has AMX support"
(I assumed it can fall back to some non optimized kernel but seems not)

If anyone with the required resources (Intel Xeon 5/6 + ~768-1TB ram) can help to test this that would be awesome.

If you have hints how to make this work on AMD Threadripper 7000 Pro series please guide me.

Thanks all!

8 replies

upgraedd

posted an update 3 days ago

Post

247

I know it doesn't seem likely but I literally am starving on the streets if anyone can help me. please just inspect my repository and you'll see what I'm doing. what I'm doing is for YOU.
bc1qga8njlqm9x76327jsap4wqq0e5kfczg8pc7p39
#CONSCIOUSNESS

takarajordan

posted an update 11 days ago

Post

3162

Two weeks ago I had an engaging discussion with locals in Cockermouth about AI and the broader industry, a reminder that hearing candid perspectives beyond our professional circles is invaluable and something anyone working full-time in this field should make time for.

Thank you!

mitkox

posted an update 13 days ago

Post

3023

I run 20 AI coding agents locally on my desktop workstation at 400+ tokens/sec with MiniMax-M2. It’s a Sonnet drop-in replacement in my Cursor, Claude Code, Droid, Kilo and Cline peak at 11k tok/sec input and 433 tok/s output, can generate 1B+ tok/m.All with 196k context window. I'm running it for 6 days now with this config.

Today max performance was stable at 490.2 tokens/sec across 48 concurrent clients and MiniMax M2.

Z8 Fury G5, Xeon 3455, 4xA6K. Aibrix 0.5.0, vLLM 0.11.2,

5 replies

csabakecskemeti

posted an update 25 days ago

Post

301

Recently there are so much activity on token efficient formats, I've also build a package (inspired by toon).

Deep-TOON

My goal was to token efficiently handle json structures with complex embeddings.

So this is what I've built on the weekend. Feel free try:

https://pypi.org/project/deep-toon/0.1.0/

takarajordan

posted an update 26 days ago

Post

257

🌞 LOVABLE IS CRACKED

Built a golden hour tracker in under 15 minutes with Lovable: uses your phone’s Geolocation API, the SunCalc library, and runs fully client-side with no servers. https://goldenhour.404missing.link

mitkox

posted an update 27 days ago

Post

4121

I just threw Qwen3-0.6B in BF16 into an on device AI drag race on AMD Strix Halo with vLLM:

564 tokens/sec on short 100-token sprints
96 tokens/sec on 8K-token marathons

TL;DR You don't just run AI on AMD. You negotiate with it.

The hardware absolutely delivers. Spoiler alert; there is exactly ONE configuration where vLLM + ROCm + Triton + PyTorch + Drivers + Ubuntu Kernel to work at the same time. Finding it required the patience of a saint

Consumer AMD for AI inference is the ultimate "budget warrior" play, insane performance-per-euro, but you need hardcore technical skills that would make a senior sysadmin nod in quiet respect.

1 reply

suayptalha

authored a paper about 1 month ago

Superpositional Gradient Descent: Harnessing Quantum Principles for Model Training

Paper • 2511.01918 • Published Nov 1 • 11

fuzzy-mittenz

updated a collection about 1 month ago

SwarmModels

Collection

48 items • Updated Oct 28 • 1

mitkox

posted an update about 1 month ago

Post

358

I have just vibe coded a feature for ODA on-device AI with MiniMax M2, running locally on my Z8 Fury - and holy silicon, this thing SLAPS!
TL;DR the nerd stuff

Specialized in coding and agentic work
60 tokens/sec
Ryzen AI is getting some serious ROCm 7.0.2 brain implants
One extra script to rule them all and bind them to my GPU
Vibe coding feature implementation that actually worked on the first try. I know, I'm scared too

mitkox

posted an update about 1 month ago

Post

1832

I’m just reading that Ryzen AI 395 has to be 30% slower than DGX Spark in LLM inferencing… and only 96GB GPU RAM… good I haven’t RTFM upfront, so I made the AMD faster with 128GB unified RAM 🫡
Z2 mini G1a can run Qwen3 Coder 30B BF16 at 26.8 tok/sec in ~60GB GPU RAM

mitkox

posted an update about 1 month ago

Post

2765

Say hello to my little friends! I just unboxed this trio of HP Z2 G1a!

Three is always better than one!
3x AMD Ryzen AI Max+ Pro 395
384GB RAM
24TB of RAID storage
Ubuntu 24.04
ROCm 7.0.2
llama cpp, vLLM and Aibrix

Small, cheap GPUs are about to become the Raspberry Pi of edge AI inference. Sprinkle some kubectl fairy dust on top, and suddenly it's a high-availability, self-healing, cloud-native, enterprise-grade AI cluster camping in a closet.

Make sure you own your AI. AI in the cloud is not aligned with you; it’s aligned with the company that owns it.

3 replies

mitkox

posted an update about 2 months ago

Post

2799

I see all Chinese labs are turning TL;DR into TL;DRGB

Problem: 1M text tokens == 1 M opportunities for your GPU to file worker-comp
Solution: don’t feed the model War & Peace—feed it the movie poster.

This is Glyph, Zai’s new visual-text compression voodoo:
• 10 k words → 3 PNGs ≈ 3 k visual tokens
• Compression ratio: 4.3×
• Throughput: 40-60 tok/s i.e. your context window now finishes before my coffee does

So I did the only reasonable thing: asked GLM-4.6 to port Glyph for Qwen3-VL-8B-Thinking.
Translation: I made one model compress a novel into a comic strip, then made another model read the comic strip and still ace QA.
It’s basically passing notes in class, except the note is a 1920×1080 meme and the teacher is a transformer.

We've gone from "Attention is All You Need" to "Attention is Too Expensive, Just Use Your Eyes." Remember kids: in 2025 literacy is optional, but JPEG is forever.

csabakecskemeti

posted an update about 2 months ago

Post

2602

Christmas came early this year

3 replies

mitkox

posted an update about 2 months ago

Post

301

Friday evening. KAT-Dev-72B-Exp is spinning in Aibrix K8s. The GPUs in the Z8 are fired up. It's a LAN party for one. After 6 months on a diet of MoEs, I'd forgotten the main-course feeling of a dense 72B model.

1 reply

fuzzy-mittenz

updated 3 collections 2 months ago

AI & ML interests

Recent Activity

Team members 34

IntelligentEstate's activity