Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers Paper • 2602.06079 • Published Feb 4 • 18
Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers Paper • 2602.06079 • Published Feb 4 • 18
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 20
Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4
SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data Paper • 2511.18411 • Published Nov 23, 2025
Trust the Model: Compact VLMs as In-Context Judges for Image-Text Data Quality Paper • 2507.20156 • Published Jul 27, 2025
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection Paper • 2502.20361 • Published Feb 27, 2025 • 1
$β$-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment Paper • 2512.12678 • Published Dec 14, 2025
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models Paper • 2511.14295 • Published Nov 18, 2025 • 74
Mindstorms in Natural Language-Based Societies of Mind Paper • 2305.17066 • Published May 26, 2023 • 3
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 41
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing Paper • 2505.00315 • Published May 1, 2025 • 1