Hugging Face Party @ PyTorch Conference

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

xianbao authored a paper about 1 month ago

RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies

mbrack authored a paper about 2 months ago

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

xiyang99 authored a paper 2 months ago

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

View all activity

csabakecskemeti

posted an update 4 days ago

Post

1173

FYI: Mistral.Ministral-3 dequantizer FP8->BF16

https://github.com/csabakecskemeti/ministral-3_dequantizer_fp8-bf16

(The instruct model weights are in FP8)

danielhanchen

posted an update 5 days ago

Post

3195

Mistral's new Ministral 3 models can now be Run & Fine-tuned locally! (16GB RAM)
Ministral 3 have vision support and the best-in-class performance for their sizes.
14B Instruct GGUF: unsloth/Ministral-3-14B-Instruct-2512-GGUF
14B Reasoning GGUF: unsloth/Ministral-3-14B-Reasoning-2512-GGUF

🐱 Step-by-step Guide: https://docs.unsloth.ai/new/ministral-3
All GGUFs, BnB, FP8 etc. variants uploads: https://huggingface.co/collections/unsloth/ministral-3

3 replies

csabakecskemeti

posted an update 5 days ago

Post

2015

Looking for some help to test an INT8 Deepseek 3.2:
SGLang supports Channel wise INT8 quants on CPUs with AMX instructions (Xeon 5 and above AFAIK)
https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/

Currently uploading an INT8 version of Deepseek 3.2 Speciale:
DevQuasar/deepseek-ai.DeepSeek-V3.2-Speciale-Channel-INT8

I cannot test this I'm on AMD
"AssertionError: W8A8Int8LinearMethod on CPU requires that CPU has AMX support"
(I assumed it can fall back to some non optimized kernel but seems not)

If anyone with the required resources (Intel Xeon 5/6 + ~768-1TB ram) can help to test this that would be awesome.

If you have hints how to make this work on AMD Threadripper 7000 Pro series please guide me.

Thanks all!

8 replies

danielhanchen

posted an update 10 days ago

Post

8213

Qwen3-Next can now be Run locally! (30GB RAM)
Instruct GGUF: unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF

The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B.
💜 Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next

Thinking GGUF: unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF

csabakecskemeti

posted an update 28 days ago

Post

301

Recently there are so much activity on token efficient formats, I've also build a package (inspired by toon).

Deep-TOON

My goal was to token efficiently handle json structures with complex embeddings.

So this is what I've built on the weekend. Feel free try:

https://pypi.org/project/deep-toon/0.1.0/

JingzeShi

posted an update 28 days ago

Post

226

Is it time to start developing sparse attention again?
https://github.com/SmallDoges/flash-sparse-attention

danielhanchen

posted an update about 1 month ago

Post

4196

You can now run Kimi K2 Thinking locally with our Dynamic 1-bit GGUFs: unsloth/Kimi-K2-Thinking-GGUF

We shrank the 1T model to 245GB (-62%) & retained ~85% of accuracy on Aider Polyglot. Run on >247GB RAM for fast inference.

We also collaborated with the Moonshot AI Kimi team on a system prompt fix! 🥰

Guide + fix details: https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-locally

csabakecskemeti

posted an update about 2 months ago

Post

2603

Christmas came early this year

3 replies

mbrack

authored a paper about 2 months ago

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

Paper • 2510.12789 • Published Oct 14 • 18

lysandre

posted an update 3 months ago

Post

7086

We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !

v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.

Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago!

6 replies

teknium

authored 2 papers 3 months ago

Hermes 3 Technical Report

Paper • 2408.11857 • Published Aug 15, 2024 • 56

Hermes 4 Technical Report

Paper • 2508.18255 • Published Aug 25 • 42

jeffboudier

posted an update 4 months ago

Post

3061

Quick 30s demo of the new Hub > Azure AI integration to deploy HF models in your own Azure account. Now with Py and CLI!

GG @alvarobartt @kramp @pagezyhf

danielhanchen

posted an update 4 months ago

Post

6465

Run DeepSeek-V3.1 locally on 170GB RAM with Dynamic 1-bit GGUFs!🐋
GGUFs: unsloth/DeepSeek-V3.1-GGUF

The 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.

The 1-bit GGUF passes all our code tests & we fixed the chat template for llama.cpp supported backends.

Guide: https://docs.unsloth.ai/basics/deepseek-v3.1

Xenova

posted an update 4 months ago

Post

11871

Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! 🤯
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! 😍

How does it work? 🤔
1️⃣ Generate and cache image features for each frame
2️⃣ Create a list of embeddings for selected patch(es)
3️⃣ Compute cosine similarity between each patch and the selected patch(es)
4️⃣ Highlight those whose score is above some threshold

... et voilà! 🥳

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!

1 reply

1024m

authored 2 papers 4 months ago

Query Attribute Modeling: Improving search relevance with Semantic Search and Meta Data Filtering

Paper • 2508.04683 • Published Aug 6

DSBC : Data Science task Benchmarking with Context engineering

Paper • 2507.23336 • Published Jul 31 • 2

JingzeShi

authored a paper 4 months ago

Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published Aug 4 • 17

Xenova

posted an update 4 months ago

Post

4529

The next generation of AI-powered websites is going to be WILD! 🤯

In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.

To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by 🤗 Transformers.js: LiquidAI/LFM2-WebGPU

As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! 🚀

2 replies

danielhanchen

posted an update 4 months ago

Post

5529

Run OpenAI's new gpt-oss models locally with Unsloth GGUFs! 🔥🦥
20b GGUF: unsloth/gpt-oss-20b-GGUF
120b GGUF: unsloth/gpt-oss-120b-GGUF

Model will run on 14GB RAM for 20b and 66GB for 120b.

2 replies

AI & ML interests

Recent Activity

Team members 194

HF-Party's activity