wolosonovich 's Collections Research
updated
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
Long Sequence Transformer Models
Paper
• 2309.14509
• Published
• 20
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper
• 2401.02412
• Published
• 38
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published
• 59
Tuning Language Models by Proxy
Paper
• 2401.08565
• Published
• 22
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper
• 2401.12954
• Published
• 33
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
• 2401.08967
• Published
• 31
MambaByte: Token-free Selective State Space Model
Paper
• 2401.13660
• Published
• 60
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Paper
• 2401.18059
• Published
• 48
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
• 2402.01093
• Published
• 47
Repeat After Me: Transformers are Better than State Space Models at
Copying
Paper
• 2402.01032
• Published
• 24
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
• 2402.01739
• Published
• 28
Scaling Laws for Downstream Task Performance of Large Language Models
Paper
• 2402.04177
• Published
• 20
An Interactive Agent Foundation Model
Paper
• 2402.05929
• Published
• 30
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
• 2402.04291
• Published
• 50
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Paper
• 2402.08609
• Published
• 36
Aya Model: An Instruction Finetuned Open-Access Multilingual Language
Model
Paper
• 2402.07827
• Published
• 48
Tandem Transformers for Inference Efficient LLMs
Paper
• 2402.08644
• Published
• 10
Learning to Learn Faster from Human Feedback with Language Model
Predictive Control
Paper
• 2402.11450
• Published
• 23
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
Workflows
Paper
• 2402.10379
• Published
• 31
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published
• 116
User-LLM: Efficient LLM Contextualization with User Embeddings
Paper
• 2402.13598
• Published
• 21
OmniPred: Language Models as Universal Regressors
Paper
• 2402.14547
• Published
• 14
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
• 2402.14083
• Published
• 47
MegaScale: Scaling Large Language Model Training to More Than 10,000
GPUs
Paper
• 2402.15627
• Published
• 36
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published
• 627
Stop Regressing: Training Value Functions via Classification for
Scalable Deep RL
Paper
• 2403.03950
• Published
• 15
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published
• 189
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper
• 2403.07816
• Published
• 44
Chronos: Learning the Language of Time Series
Paper
• 2403.07815
• Published
• 48
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a
Single GPU
Paper
• 2403.06504
• Published
• 56
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper
• 2403.10704
• Published
• 60
TnT-LLM: Text Mining at Scale with Large Language Models
Paper
• 2403.12173
• Published
• 20
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
• 2405.07863
• Published
• 71
SLAB: Efficient Transformers with Simplified Linear Attention and
Progressive Re-parameterized Batch Normalization
Paper
• 2405.11582
• Published
• 17
Towards Modular LLMs by Building and Reusing a Library of LoRAs
Paper
• 2405.11157
• Published
• 31
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper
• 2406.02657
• Published
• 41
GEB-1.3B: Open Lightweight Large Language Model
Paper
• 2406.09900
• Published
• 21
Is It Really Long Context if All You Need Is Retrieval? Towards
Genuinely Difficult Long Context NLP
Paper
• 2407.00402
• Published
• 22
Agentless: Demystifying LLM-based Software Engineering Agents
Paper
• 2407.01489
• Published
• 65
On Leakage of Code Generation Evaluation Datasets
Paper
• 2407.07565
• Published
• 6
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper
• 2407.09025
• Published
• 139
E5-V: Universal Embeddings with Multimodal Large Language Models
Paper
• 2407.12580
• Published
• 42
Improving Text Embeddings for Smaller Language Models Using Contrastive
Fine-tuning
Paper
• 2408.00690
• Published
• 25
CodexGraph: Bridging Large Language Models and Code Repositories via
Code Graph Databases
Paper
• 2408.03910
• Published
• 18
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
• 2408.06195
• Published
• 73
OLMoE: Open Mixture-of-Experts Language Models
Paper
• 2409.02060
• Published
• 80
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of
Experts
Paper
• 2409.16040
• Published
• 16
Large Language Models as Markov Chains
Paper
• 2410.02724
• Published
• 33
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via
Inference-time Hybrid Information Structurization
Paper
• 2410.08815
• Published
• 47
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper
• 2410.10814
• Published
• 51
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge
in RAG Systems
Paper
• 2411.02959
• Published
• 71
Star Attention: Efficient LLM Inference over Long Sequences
Paper
• 2411.17116
• Published
• 53
Personalized Graph-Based Retrieval for Large Language Models
Paper
• 2501.02157
• Published
• 31
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper
• 2501.13200
• Published
• 69
Scaling Embedding Layers in Language Models
Paper
• 2502.01637
• Published
• 24
A Comprehensive Survey on Long Context Language Modeling
Paper
• 2503.17407
• Published
• 49
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published
• 62
Agentic Reasoning and Tool Integration for LLMs via Reinforcement
Learning
Paper
• 2505.01441
• Published
• 39