Instructions to use Anzhc/Qwen2D-VAE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Anzhc/Qwen2D-VAE with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Anzhc/Qwen2D-VAE", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
A modification of Qwen-Image-VAE to remove temporal dimension, collapsing it from 3D to 2D. Primarily meant for caching latents for training, but also can be used in ComfyUI to save a second ever so often.
Why?
Because you don't need temporal dimension for Image models (unless you're doing research related to temporally consistent gens on them).
Collapsing it to 2D does not alter output, while reducing required VRAM 3x, and speeding it up 2.5x.
But what if im using temporal dimension for control input?
You don't encode them in temporal dimension, they are separate latents. It works.
Fully compatible with image models using Qwen/Wan VAEs.
Use this node pack to use it in your ComfyUI - https://github.com/Anzhc/anzhc-qwen2d-comfyui
- Downloads last month
- 9,050
Model tree for Anzhc/Qwen2D-VAE
Base model
Qwen/Qwen-Image