--- license: mit pipeline_tag: image-to-image --- # Vector Quantization using Gaussian Variational Autoencoder This repository contains the official implementation of **Gaussian Quant (GQ)**, a novel method for vector quantization presented in the paper "[Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609)". GQ proposes a simple yet effective technique that converts a Gaussian Variational Autoencoder (VAE) into a VQ-VAE without the need for additional training. It achieves this by generating random Gaussian noise as a codebook and finding the closest noise to the posterior mean. Theoretically, it's proven that a small quantization error is guaranteed when the logarithm of the codebook size exceeds the bits-back coding rate. Empirically, GQ, combined with a heuristic called target divergence constraint (TDC), outperforms previous VQ-VAEs like VQGAN, FSQ, LFQ, and BSQ on both UNet and ViT architectures. - \ud83d\udcda **Paper on Hugging Face:** [Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609) - \ud83c\udf10 **Project Page:** [https://tongdaxu.github.io/pages/gq.html](https://tongdaxu.github.io/pages/gq.html) - \ud83d\udcbb **GitHub Repository:** [https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE) ## Quick Start & Usage This section provides a quick guide to installing the necessary dependencies, downloading pre-trained models, and inferring with them. For more details and training instructions, please refer to the [GitHub repository](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE). ### Install dependency * Install dependencies in `environment.yaml`: ```bash conda env create --file=environment.yaml conda activate tokenizer ``` ### Install this package * From source: ```bash pip install -e . ``` * [Optional] CUDA kernel for fast run time: ```bash cd gq_cuda_extension pip install --no-build-isolation -e . ``` ### Download pre-trained model * Download model "sd3unet_gq_0.25.ckpt" from [Huggingface](https://huggingface.co/xutongda/GQModel): ```bash mkdir model_256 mv "sd3unet_gq_0.25.ckpt" ./model_256 ``` * This is a VQ-VAE with `codebook_size=2**16=65536` and `codebook_dim=16`. ### Infer the model as VQ-VAE * Then use the model as follows: ```Python from PIL import Image from torchvision import transforms from omegaconf import OmegaConf from pit.util import instantiate_from_config import torch transform = transforms.Compose([ transforms.Resize((256,256)), transforms.ToTensor(), transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) ]) img = transform(Image.open("demo.png")).unsqueeze(0).cuda() config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml") vae = instantiate_from_config(config.model) vae.load_state_dict( torch.load("models_256/sd3unet_gq_0.25.ckpt", map_location=torch.device('cpu'))["state_dict"],strict=False ) vae = vae.eval().cuda() vae.eval() z, log = vae.encode(img, return_reg_log=True) img_hat = vae.dequant(log["indices"]) # discrete indices img_hat = vae.decode(z) # quantized latent ``` ### Infer the model as Gaussian VAE * Alternatively, the model can be used as a Vanilla Gaussian VAE: ```Python from PIL import Image from torchvision import transforms from omegaconf import OmegaConf from pit.util import instantiate_from_config import torch transform = transforms.Compose([ transforms.Resize((256,256)), transforms.ToTensor(), transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) ]) img = transform(Image.open("demo.png")).unsqueeze(0).cuda() config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml") vae = instantiate_from_config(config.model) vae.load_state_dict( torch.load("models_256/sd3unet_gq_0.25.ckpt", map_location=torch.device('cpu'))["state_dict"],strict=False ) vae = vae.eval().cuda() vae.eval() z = vae.encode(img, return_reg_log=True)[1]["zhat_noquant"] # Gaussian VAE latents img_hat = vae.decode(z) ``` ## Citation If you find our work helpful or inspiring, please feel free to cite it: ```bibtex @misc{xu2025vectorquantizationusinggaussian, title={Vector Quantization using Gaussian Variational Autoencoder}, author={Tongda Xu and Wendi Zheng and Jiajun He and Jose Miguel Hernandez-Lobato and Yan Wang and Ya-Qin Zhang and Jie Tang}, year={2025}, eprint={2512.06609}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2512.06609}, } ```