Collections
Discover the best community collections!
Collections including paper arxiv:2508.15096
-
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper • 2504.11393 • Published • 18 -
Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources
Paper • 2504.04152 • Published • 1 -
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining
Paper • 2508.10975 • Published • 60 -
Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset
Paper • 2412.02595 • Published • 6
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 206 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 42 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 67 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 273
-
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper • 2504.11393 • Published • 18 -
Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources
Paper • 2504.04152 • Published • 1 -
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining
Paper • 2508.10975 • Published • 60 -
Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset
Paper • 2412.02595 • Published • 6
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 206 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 42 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 67 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 273