13 16 13

Ziyin Zhang

Geralt-Targaryen

Geralt-Targaryen

AI & ML interests

None yet

Recent Activity

updated a dataset 1 day ago

Geralt-Targaryen/openwebtext2

updated a dataset 14 days ago

Geralt-Targaryen/zh-bo-instruct

published a dataset 14 days ago

Geralt-Targaryen/zh-bo-instruct

View all activity

Organizations

upvoted a paper about 2 months ago

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

Paper • 2510.10159 • Published Oct 11 • 3

upvoted 2 papers 2 months ago

D2LLM: Decomposed and Distilled Large Language Models for Semantic Search

Paper • 2406.17262 • Published Jun 25, 2024 • 5

F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data

Paper • 2510.02294 • Published Oct 2 • 45

upvoted a collection 2 months ago

Codefuse Embeddings

Collection

8 items • Updated Oct 3 • 6

upvoted 3 papers 3 months ago

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Paper • 2409.06679 • Published Sep 10, 2024 • 4

CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects

Paper • 2509.14856 • Published Sep 18 • 1

CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in China

Paper • 2509.09990 • Published Sep 12 • 2

upvoted a paper 4 months ago

From Black Box to Transparency: Enhancing Automated Interpreting Assessment with Explainable AI in College Classrooms

Paper • 2508.10860 • Published Aug 14 • 3

upvoted 2 collections 6 months ago

codefuse-papers

Collection

14 items • Updated Oct 3 • 4

DeepTheorem

Collection

A dataset and RL-zero pipeline for advanced mathematical reasoning of informal theorem proving. • 6 items • Updated Jun 11 • 2

upvoted 3 papers 6 months ago

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

Paper • 2505.23754 • Published May 29 • 15

Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM

Paper • 2503.17793 • Published Mar 22 • 23

Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks

Paper • 2505.16901 • Published May 22 • 48

upvoted a paper 10 months ago

Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages

Paper • 2502.10852 • Published Feb 15 • 2

upvoted a paper about 1 year ago

Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding

Paper • 2411.18462 • Published Nov 27, 2024 • 6

upvoted a paper about 2 years ago

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 26

Ziyin Zhang

AI & ML interests

Recent Activity

Organizations

Geralt-Targaryen's activity