Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
3
5
Yicun Yang
maomaocun
Follow
CuiLong7's profile picture
zichenwen's profile picture
2 followers
ยท
9 following
https://github.com/maomaocun
maomaocun
maomaocun
AI & ML interests
efficient AI ,deep learning
Recent Activity
liked
a Space
about 1 month ago
Ruurd/tini-lad
reacted
to
Ruurd
's
post
with ๐ฅ
about 1 month ago
The past year I have been trying to get diffusion models to work for language generation, without having to retrain a LLM from scratch. And recently, we finally succeeded: We introduce "LAD: LoRA-Adapted Denoiser", a method to convert a LLaMA model into a text diffusion model using LoRA finetuning and structured input corruption. ๐ฏ Try the demo and read the write-up here! https://ruurdkuiper.github.io/tini-lad/ Unlike autoregressive (word-for-word) models like ChatGPT, diffusion models iteratively refine a noised sequence. However, most current diffusion approaches rely on all-parameter retraining and repeatedly remasking tokens, which is costly and slow during both training and inference! ๐ง With LAD: - We can finetune an autoregressive model for diffusive generation in just 10 hours on a single GPU. - Test-time compute is fully adjustable: fewer steps means faster outputs while more steps improve output quality. - Due to our unique noising schedule, remasking is not always needed during inference. All tokens are attended to in each iteration! ๐ LAD is built using: โ A frozen LLaMA-8B backbone โ Structured noising: token swaps, duplications, replacements, span shifts โ Modified attention masks for bidirectional decoding ๐ก We show that even small, fast-trained models can perform diffusive generation โ with competitive benchmark performance, perplexity and more flexible test-time behavior than traditional transformers.
reacted
to
Ruurd
's
post
with ๐
about 1 month ago
The past year I have been trying to get diffusion models to work for language generation, without having to retrain a LLM from scratch. And recently, we finally succeeded: We introduce "LAD: LoRA-Adapted Denoiser", a method to convert a LLaMA model into a text diffusion model using LoRA finetuning and structured input corruption. ๐ฏ Try the demo and read the write-up here! https://ruurdkuiper.github.io/tini-lad/ Unlike autoregressive (word-for-word) models like ChatGPT, diffusion models iteratively refine a noised sequence. However, most current diffusion approaches rely on all-parameter retraining and repeatedly remasking tokens, which is costly and slow during both training and inference! ๐ง With LAD: - We can finetune an autoregressive model for diffusive generation in just 10 hours on a single GPU. - Test-time compute is fully adjustable: fewer steps means faster outputs while more steps improve output quality. - Due to our unique noising schedule, remasking is not always needed during inference. All tokens are attended to in each iteration! ๐ LAD is built using: โ A frozen LLaMA-8B backbone โ Structured noising: token swaps, duplications, replacements, span shifts โ Modified attention masks for bidirectional decoding ๐ก We show that even small, fast-trained models can perform diffusive generation โ with competitive benchmark performance, perplexity and more flexible test-time behavior than traditional transformers.
View all activity
Organizations
models
5
Sort:ย Recently updated
maomaocun/dLLM-Var
8B
โข
Updated
Oct 29
โข
125
โข
4
maomaocun/LLaDA-MoE-7B-A1B-Instruct-fusemoe
7B
โข
Updated
Oct 15
โข
3
maomaocun/LLaDA-MoE-7B-A1B-Base-fusemoe
7B
โข
Updated
Oct 15
โข
4
maomaocun/dLLM-Var-no-template
8B
โข
Updated
Oct 7
โข
6
โข
1
maomaocun/NCFM
Updated
Jul 15
datasets
0
None public yet