🎥 ViBT: Vision Bridge Transformer at Scale

This repository introduces Vision Bridge Transformer (ViBT), a large-scale instantiation of Brownian Bridge Models designed for efficient conditional generation. ViBT directly models the trajectory between inputs and outputs, creating an efficient data-to-data translation paradigm. The models demonstrate effectiveness for various image and video translation tasks, including instruction-based image editing and complex video translation.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Yuanshi
/

ViBT

🎥 ViBT: Vision Bridge Transformer at Scale

Space using Yuanshi/ViBT 1

Collection including Yuanshi/ViBT

ViBT