Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.03300

openchat/openchat-3.5-1210

Text Generation • 7B • Updated May 18, 2024 • 649 • 278
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8, 2024 • 73
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 137
Babelscape/rebel-large

0.4B • Updated Jun 20, 2023 • 33.9k • 230

Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

Paper • 2311.06720 • Published Nov 12, 2023 • 9
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 44
TinyGSM: achieving >80% on GSM8k with small language models

Paper • 2312.09241 • Published Dec 14, 2023 • 40
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 31

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 104
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 55

Moral Foundations of Large Language Models

Paper • 2310.15337 • Published Oct 23, 2023 • 1
Specific versus General Principles for Constitutional AI

Paper • 2310.13798 • Published Oct 20, 2023 • 3
Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 25
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 52

Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model

Paper • 2309.03550 • Published Sep 7, 2023 • 12
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 241
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 15

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 20
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 11
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 14
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 49

Dataset curation

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

Paper • 2308.12032 • Published Aug 23, 2023 • 1
Know thy corpus! Robust methods for digital curation of Web corpora

Paper • 2003.06389 • Published Mar 13, 2020 • 1
Self-Alignment with Instruction Backtranslation

Paper • 2308.06259 • Published Aug 11, 2023 • 42
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

Paper • 2305.06156 • Published May 9, 2023 • 2

KwaiYiiMath: Technical Report

Paper • 2310.07488 • Published Oct 11, 2023 • 3
Forward-Backward Reasoning in Large Language Models for Mathematical Verification

Paper • 2308.07758 • Published Aug 15, 2023 • 4
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning

Paper • 2309.10814 • Published Sep 19, 2023 • 3
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

Paper • 2310.03731 • Published Oct 5, 2023 • 29

Text-to-3D using Gaussian Splatting

Paper • 2309.16585 • Published Sep 28, 2023 • 30
FP8-LM: Training FP8 Large Language Models

Paper • 2310.18313 • Published Oct 27, 2023 • 33
Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 122
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Paper • 2312.06585 • Published Dec 11, 2023 • 29

openchat/openchat-3.5-1210

Text Generation • 7B • Updated May 18, 2024 • 649 • 278
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8, 2024 • 73
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 137
Babelscape/rebel-large

0.4B • Updated Jun 20, 2023 • 33.9k • 230

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 20
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 11
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 14
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 49

Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

Paper • 2311.06720 • Published Nov 12, 2023 • 9
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 44
TinyGSM: achieving >80% on GSM8k with small language models

Paper • 2312.09241 • Published Dec 14, 2023 • 40
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 31

Dataset curation

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

Paper • 2308.12032 • Published Aug 23, 2023 • 1
Know thy corpus! Robust methods for digital curation of Web corpora

Paper • 2003.06389 • Published Mar 13, 2020 • 1
Self-Alignment with Instruction Backtranslation

Paper • 2308.06259 • Published Aug 11, 2023 • 42
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

Paper • 2305.06156 • Published May 9, 2023 • 2

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 104
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 55

KwaiYiiMath: Technical Report

Paper • 2310.07488 • Published Oct 11, 2023 • 3
Forward-Backward Reasoning in Large Language Models for Mathematical Verification

Paper • 2308.07758 • Published Aug 15, 2023 • 4
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning

Paper • 2309.10814 • Published Sep 19, 2023 • 3
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

Paper • 2310.03731 • Published Oct 5, 2023 • 29

Moral Foundations of Large Language Models

Paper • 2310.15337 • Published Oct 23, 2023 • 1
Specific versus General Principles for Constitutional AI

Paper • 2310.13798 • Published Oct 20, 2023 • 3
Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 25
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 52

Text-to-3D using Gaussian Splatting

Paper • 2309.16585 • Published Sep 28, 2023 • 30
FP8-LM: Training FP8 Large Language Models

Paper • 2310.18313 • Published Oct 27, 2023 • 33
Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 122
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Paper • 2312.06585 • Published Dec 11, 2023 • 29

Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model

Paper • 2309.03550 • Published Sep 7, 2023 • 12
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 241
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 15

Previous
1
...
5
6
7
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs