NeuLab @ LTI/CMU

university

https://www.cs.cmu.edu/~neulab/

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

Nyandwi updated a dataset 11 days ago

neulab/behavioral-lift

seungone authored a paper 26 days ago

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

seungone authored a paper 26 days ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

View all activity

updated a dataset 11 days ago

neulab/behavioral-lift

Viewer • Updated 11 days ago • 15.3k • 107 • 1

authored 2 papers 26 days ago

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

Paper • 2605.26457 • Published May 26 • 7

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published about 1 month ago • 59

submitted a paper to Daily Papers 30 days ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published about 1 month ago • 59

authored a paper about 1 month ago

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published May 20 • 13

submitted a paper to Daily Papers about 1 month ago

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published May 20 • 13

authored 2 papers about 2 months ago

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Paper • 2603.18886 • Published Mar 19 • 6

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper • 2605.09063 • Published May 9 • 82

published a dataset about 2 months ago

neulab/behavioral-lift

Viewer • Updated 11 days ago • 15.3k • 107 • 1

authored a paper 3 months ago

IDIOLEX: Unified and Continuous Representations for Idiolectal and Stylistic Variation

Paper • 2604.04704 • Published Apr 6

updated a model 3 months ago

neulab/codescout-14b-strict-k-turns-150

15B • Updated Mar 31 • 1

published a model 3 months ago

neulab/codescout-14b-strict-k-turns-150

15B • Updated Mar 31 • 1

updated a model 3 months ago

neulab/codescout-14b-strict-k-turns

15B • Updated Mar 31 • 2

published a model 3 months ago

neulab/codescout-14b-strict-k-turns

15B • Updated Mar 31 • 2

in neulab/VisualPuzzles 3 months ago

fix inconsistent data split name

#2 opened 3 months ago by

updated a dataset 3 months ago

neulab/VisualPuzzles

Viewer • Updated Mar 27 • 1.17k • 1.65k • 11

authored a paper 3 months ago

Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning

Paper • 2601.18722 • Published Jan 26

updated a dataset 4 months ago

neulab/agent-data-collection

Preview • Updated Mar 9 • 2.97k • 115

updated a model 4 months ago

neulab/adversarial-paraphraser-qwen3-8b

8B • Updated Mar 6 • 7 • 3

published a model 4 months ago

neulab/adversarial-paraphraser-qwen3-8b

8B • Updated Mar 6 • 7 • 3