Yicun Yang's picture

4 5

Yicun Yang

maomaocun

·

https://github.com/maomaocun

AI & ML interests

efficient AI ,deep learning

Recent Activity

upvoted a paper about 1 month ago

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

liked a Space 4 months ago

Ruurd/tini-lad

reacted to Ruurd's post with 🔥 4 months ago

The past year I have been trying to get diffusion models to work for language generation, without having to retrain a LLM from scratch. And recently, we finally succeeded: We introduce "LAD: LoRA-Adapted Denoiser", a method to convert a LLaMA model into a text diffusion model using LoRA finetuning and structured input corruption. 🎯 Try the demo and read the write-up here! https://ruurdkuiper.github.io/tini-lad/ Unlike autoregressive (word-for-word) models like ChatGPT, diffusion models iteratively refine a noised sequence. However, most current diffusion approaches rely on all-parameter retraining and repeatedly remasking tokens, which is costly and slow during both training and inference! 🧠 With LAD: - We can finetune an autoregressive model for diffusive generation in just 10 hours on a single GPU. - Test-time compute is fully adjustable: fewer steps means faster outputs while more steps improve output quality. - Due to our unique noising schedule, remasking is not always needed during inference. All tokens are attended to in each iteration! 🔍 LAD is built using: – A frozen LLaMA-8B backbone – Structured noising: token swaps, duplications, replacements, span shifts – Modified attention masks for bidirectional decoding 💡 We show that even small, fast-trained models can perform diffusive generation — with competitive benchmark performance, perplexity and more flexible test-time behavior than traditional transformers.

View all activity

Organizations

models 5

maomaocun/dLLM-Var

8B • Updated Oct 29, 2025 • 10 • 4

maomaocun/LLaDA-MoE-7B-A1B-Instruct-fusemoe

7B • Updated Oct 15, 2025

maomaocun/LLaDA-MoE-7B-A1B-Base-fusemoe

7B • Updated Oct 15, 2025 • 53

maomaocun/dLLM-Var-no-template

8B • Updated Oct 7, 2025 • 1 • 1

maomaocun/NCFM

Updated Jul 15, 2025

datasets 0

None public yet