@mitkox on Hugging Face: "I just threw Qwen3-0.6B in BF16 into an on device AI drag race on AMD Strix…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

mitkox

posted an update 27 days ago

Post

4121

I just threw Qwen3-0.6B in BF16 into an on device AI drag race on AMD Strix Halo with vLLM:

564 tokens/sec on short 100-token sprints
96 tokens/sec on 8K-token marathons

TL;DR You don't just run AI on AMD. You negotiate with it.

The hardware absolutely delivers. Spoiler alert; there is exactly ONE configuration where vLLM + ROCm + Triton + PyTorch + Drivers + Ubuntu Kernel to work at the same time. Finding it required the patience of a saint

Consumer AMD for AI inference is the ultimate "budget warrior" play, insane performance-per-euro, but you need hardcore technical skills that would make a senior sysadmin nod in quiet respect.

Complete123123

25 days ago

AMD CPU name ？

In this post

mitkox Mitko Vasilev
Complete123123 community