view article Article Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL +6 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra, sergiopaniego • 8 days ago • 39
Nemotron-Labs-Diffusion Collection A Tri-Mode Language Model Family Unifying Autoregressive, Diffusion, and Self-Speculation Decoding • 7 items • Updated about 1 hour ago • 48
view article Article Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation nvidia • 17 days ago • 21
view article Article Unlocking asynchronicity in continuous batching +1 ror, pcuenq, ariG23498 • 21 days ago • 57
view article Article Building Blocks for Foundation Model Training and Inference on AWS amazon • 23 days ago • 23
view article Article CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models lablab-ai-amd-developer-hackathon • 27 days ago • 10
Mistral Medium 3.5 Collection Our first flaship models handling instruction-following, reasoning, and coding in a single set of opened-weights. • 2 items • Updated Apr 29 • 17
MOSS-Audio Collection An open-source audio understanding model supporting speech recognition, environmental sound analysis, music understanding, time-aware QA, and complex • 7 items • Updated May 2 • 63
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 227
view article Article How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs nielsr • Apr 7 • 62
view article Article Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents ibm-research • Apr 15 • 28