view article Article **Canada Must Not Turn AI Chatbots Into a New Surveillance Frontier** 1 day ago • 3
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 12
The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub Paper • 2405.13058 • Published May 20, 2024 • 3
Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face Paper • 2508.06811 • Published Aug 9, 2025 • 5
Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem Paper • 2512.03073 • Published Nov 27, 2025 • 7
Zagreus 0.4B Collection The Zagreus-0.4B collection contains four bilingual English + Romance language foundational SLMs (~400M parameters) trained from scratch • 4 items • Updated 14 days ago • 6
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published 6 days ago • 60
view changelog Hugging Face Changelog Introducing Buckets: S3-like storage on the Hub 8 days ago • 167
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 12 items • Updated 1 day ago • 122
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 8 days ago • 63
🤏 Smol-Data Collection Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated 16 days ago • 12
Finance Commons Collection A large collection of multimodal financial documents in open data. • 7 items • Updated Jul 17, 2024 • 13
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated 20 days ago • 87
view article Article Did GPT 5.2 make a breakthrough discovery in theoretical physics? 26 days ago • 61