You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

OpenVAE

arXiv GitHub Hugging Face

Open-source VAE family for medical imaging. Pretrained latent backbones for CT/MRI diffusion models.

2D and 3D autoencoders trained on up to 1M CT volumes with perceptual, adversarial, and segmentation-guided objectives.

Main paper: Distilling Photon-Counting CT into Routine Chest CT through Clinically Validated Degradation Modeling (PDF) β€” cite this work when you use OpenVAE weights, released CT latents, or related resources from this repository. Code: github.com/KumaKuma2002/OpenVAE.

OpenVAE Teaser

Release timeline

Hugging Face β€” SMILE-project/OpenVAE

  • Mar 15, 2026 β€” 2D OpenVAE weights uploaded to Hugging Face.
  • Apr 6, 2026 β€” First 3D 64Β³ patch checkpoint uploaded (OpenVAE-3D-4x-patch64-10K).

Models and CT reconstruction benchmark

Reconstruction metrics on the OpenVAE CT hold-out benchmark (12 cases); SSIM and PSNR are higher-is-better, LPIPS is lower-is-better.

Model Type Patients Latent Resolution SSIM PSNR (dB) LPIPS
stable-diffusion-v1-5 KL-VAE 0 8Γ— 512Β² RGB N/A N/A N/A
stable-diffusion-3.5-large KL-VAE 0 8Γ— β€” N/A N/A N/A
MAISI KL-VAE 0 4Γ— 64Β³ β€” β€” β€”
OpenVAE-2D-4x-2K KL-VAE 2K 4Γ— 512Β² 0.8932 34.87 0.0868
OpenVAE-2D-4x-10K KL-VAE 10K 4Γ— 512Β² 0.8867 34.91 0.0816
OpenVAE-2D-4x-10K-pro KL-VAE 10K 4Γ— 512Β² 0.8880 34.51 0.0781
OpenVAE-2D-4x-20K KL-VAE 20K 4Γ— 512Β² 0.8798 34.63 0.0782
OpenVAE-2D-4x-100K KL-VAE 100K 4Γ— 512Β² 0.8835 34.24 0.0752
OpenVAE-2D-4x-300K KL-VAE 300K 4Γ— 512Β² 0.8874 33.84 0.0852
OpenVAE-2D-4x-PCCT_enhanced KL-VAE 300K 4Γ— 512Β² 0.8813 34.00 0.0756
OpenVAE-3D-4x-patch64-10K KL-VAE 10K 4Γ— 64Β³ 0.8099 25.99 0.1565
OpenVAE-3D-4x-20K KL-VAE 20K 4Γ— 64Β³ β€” β€” β€”
OpenVAE-3D-4x-20K KL-VAE 20K 4Γ— 128Β³ β€” β€” β€”
OpenVAE-3D-4x-100K KL-VAE 100K 4Γ— 128Β³ β€” β€” β€”
OpenVAE-3D-4x-1M KL-VAE 1M 4Γ— 128Β³ β€” β€” β€”
OpenVAE-3D-4x-100K-VQ VQ-VAE 100K 4Γ— 64Β³ β€” β€” β€”
OpenVAE-3D-8x-100K-VQ VQ-VAE 100K 8Γ— 64Β³ β€” β€” β€”

Citation β€” please cite the primary paper below for OpenVAE. If you build on the earlier anatomy-aware contrast-enhancement line or compare against that benchmark framing, also cite the related work.

Primary (OpenVAE / SUMI, arXiv:2604.07329):

@misc{liu2026distillingphotoncountingctroutine,
  title={Distilling Photon-Counting CT into Routine Chest CT through Clinically Validated Degradation Modeling},
  author={Junqi Liu and Xinze Zhou and Wenxuan Li and Scott Ye and Arkadiusz Sitek and Xiaofeng Yang and Yucheng Tang and Daguang Xu and Kai Ding and Kang Wang and Yang Yang and Alan L. Yuille and Zongwei Zhou},
  year={2026},
  eprint={2604.07329},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2604.07329},
}

Related (anatomy-aware contrast enhancement, arXiv:2512.07251):

@article{liu2025see,
  title={See More, Change Less: Anatomy-Aware Diffusion for Contrast Enhancement},
  author={Liu, Junqi and Wu, Zejun and Bassi, Pedro RAS and Zhou, Xinze and Li, Wenxuan and Hamamci, Ibrahim E and Er, Sezgin and Lin, Tianyu and Luo, Yi and P{\l}otka, Szymon and others},
  journal={arXiv preprint arXiv:2512.07251},
  year={2025}
}

N/A β€” general-domain VAE; not evaluated on the medical CT benchmark above. β€” β€” checkpoint listed for distribution; benchmark row not run in the current summary.csv.

Pretrained weights: huggingface.co/SMILE-project/OpenVAE

Quick Start (2D Diffusers / 3D MONAI)

2D VAE (Diffusers)

import torch
from diffusers import AutoencoderKL

vae = AutoencoderKL.from_pretrained(
    "SMILE-project/OpenVAE", subfolder="vae"
).to("cuda").eval()

x = torch.randn(1, 3, 512, 512, device="cuda")
with torch.no_grad():
    z = vae.encode(x).latent_dist.sample()   # (1, 4, 128, 128)
    x_hat = vae.decode(z).sample              # (1, 3, 512, 512)

3D VAE (MONAI)

import torch
from monai.apps.generation.maisi.networks.autoencoderkl_maisi import AutoencoderKlMaisi

model = AutoencoderKlMaisi(
    spatial_dims=3, in_channels=1, out_channels=1,
    num_channels=(64, 128, 256), latent_channels=4,
    num_res_blocks=(2, 2, 2), norm_num_groups=32,
    attention_levels=(False, False, False),
)
model.load_state_dict(torch.load("ckpt/OpenVAE-3D-4x-100K/autoencoder_best.pt"))
model.eval().to("cuda")

x = torch.randn(1, 1, 64, 64, 64, device="cuda")
with torch.no_grad():
    z, _, _ = model.encode(x)   # (1, 4, 16, 16, 16)
    x_hat = model.decode(z)     # (1, 1, 64, 64, 64)

CT reconstruction

# 2D slice-by-slice
python src/demo_medvae.py --input scan.nii.gz --checkpoint ckpt/OpenVAE-2D-4x-100K

# 3D sliding-window (example: 100K checkpoint; use OpenVAE-3D-4x-patch64-10K path for the 10K 64Β³ model)
python test/test_3dvae.py --input scan.nii.gz --checkpoint ckpt/OpenVAE-3D-4x-100K/autoencoder_best.pt
Training
# 2D KL-VAE (multi-GPU, Accelerate)
accelerate launch src/train_klvae.py \
    --train_data_dir /data/AbdomenAtlasPro \
    --output_dir outputs/klvae \
    --resolution 512 \
    --train_batch_size 8 \
    --learning_rate 1e-4

# 3D VAE (single GPU, patch-based)
python src/train_3dvae.py \
    --train_data_dir /data/AbdomenAtlasPro \
    --patch_size 64 64 64 \
    --num_epochs 100 \
    --amp

# 3D VAE from NIfTI volumes (ct.nii.gz or ct.nii per subject folder)
python src/train_3dvae.py \
    --train_data_dir /data/AbdomenAtlasPro \
    --use-nifti \
    --patch_size 64 64 64 \
    --num_epochs 100 \
    --amp

Data format:

  • HDF5 (default): train_data_dir/<subject_id>/ct.h5 β€” key "image", shape (H, W, D), HU in [-1000, 1000].
  • NIfTI (--use-nifti): train_data_dir/<subject_id>/ct.nii.gz or ct.nii β€” same HU range and shape convention (H, W, D); requires nibabel.
Benchmark scripts
# Run reconstruction on all models
bash test/benchmark.sh /path/to/ct_dir

# Recompute metrics (no re-inference)
python test/direct_compute_metrics.py --benchmark_root outputs/vae_benchmark --skip-lpips

# Plot
python test/plot_benchmark_metrics.py
Metric definitions (MAE / Detail / SSIM / PSNR / LPIPS)
Metric Scale Direction Description
MAE_100 0–100 higher = better (1 - MAE) * 100 on 3D volumes in [0,1]
Detail_100 0–100 higher = better Pearson corr of 3D gradient magnitudes
SSIM 0–1 higher = better Structural similarity
PSNR dB higher = better Peak signal-to-noise ratio
LPIPS 0–1 lower = better Learned perceptual similarity (AlexNet)

Full per-metric dumps (including MAE_100 and Detail_100) live in outputs/vae_benchmark/summary.csv.

Upload checkpoints to Hugging Face

After hf auth login (token with write access to SMILE-project/OpenVAE):

bash scripts/upload_hf_checkpoints.sh

This uploads MAISI/maisi_autoencoder.pt and OpenVAE-3D-4x-patch64-10K/autoencoder_best.pt (and autoencoder_latest.pt). To refresh the Hub model card, upload README.md:

hf upload SMILE-project/OpenVAE README.md README.md \
  --commit-message "docs: merged models + benchmark table and timeline"
Project structure
OpenVAE/
β”œβ”€β”€ scripts/
β”‚   └── upload_hf_checkpoints.sh  # Push MAISI + 3D patch64-10K to Hugging Face Hub
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ train_klvae.py            # 2D KL-VAE training (Diffusers + Accelerate)
β”‚   β”œβ”€β”€ train_3dvae.py            # 3D VAE training (MONAI MAISI)
β”‚   β”œβ”€β”€ demo_medvae.py            # 2D inference demo
β”‚   β”œβ”€β”€ utils_loss.py             # Segmentation + GAN loss utilities
β”‚   β”œβ”€β”€ utils_discriminator.py    # PatchGAN / StyleGAN discriminators
β”‚   β”œβ”€β”€ MIRA2D/                   # 2D SR / enhancement reference code
β”‚   └── MIRA3D/                   # 3D SR (MONAI)
β”œβ”€β”€ test/
β”‚   β”œβ”€β”€ benchmark_vae.py          # Full benchmark (inference + metrics)
β”‚   β”œβ”€β”€ direct_compute_metrics.py # Metrics-only recomputation
β”‚   β”œβ”€β”€ plot_benchmark_metrics.py # Visualization
β”‚   β”œβ”€β”€ test_2dvae.py             # 2D VAE inference
β”‚   └── test_3dvae.py             # 3D VAE inference (sliding-window)
└── ckpt/                         # Model checkpoints (download from HF)
Citation

Same as the Citation block above: primary OpenVAE paper (arXiv:2604.07329), optional related contrast-enhancement paper (arXiv:2512.07251).

Primary:

@misc{liu2026distillingphotoncountingctroutine,
  title={Distilling Photon-Counting CT into Routine Chest CT through Clinically Validated Degradation Modeling},
  author={Junqi Liu and Xinze Zhou and Wenxuan Li and Scott Ye and Arkadiusz Sitek and Xiaofeng Yang and Yucheng Tang and Daguang Xu and Kai Ding and Kang Wang and Yang Yang and Alan L. Yuille and Zongwei Zhou},
  year={2026},
  eprint={2604.07329},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2604.07329},
}

Related:

@article{liu2025see,
  title={See More, Change Less: Anatomy-Aware Diffusion for Contrast Enhancement},
  author={Liu, Junqi and Wu, Zejun and Bassi, Pedro RAS and Zhou, Xinze and Li, Wenxuan and Hamamci, Ibrahim E and Er, Sezgin and Lin, Tianyu and Luo, Yi and P{\l}otka, Szymon and others},
  journal={arXiv preprint arXiv:2512.07251},
  year={2025}
}

License

MIT

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for SMILE-project/OpenVAE