BEE-spoke-data/pegasus-x-base-synthsumm_open-16k Summarization ⢠0.3B ⢠Updated 9 days ago ⢠56 ⢠3
Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics Paper ⢠2603.01209 ⢠Published Mar 1 ⢠1
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper ⢠2502.12982 ⢠Published Feb 18, 2025 ⢠19
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper ⢠2509.25531 ⢠Published Sep 29, 2025 ⢠10
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper ⢠2509.25531 ⢠Published Sep 29, 2025 ⢠10
Survivor Library Books - OCR Collection Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs ⢠2 items ⢠Updated Jul 14, 2025 ⢠5