๐ŸŽฅ ViBT: Vision Bridge Transformer at Scale

Project Page arXiv HuggingFace GitHub

This repository introduces Vision Bridge Transformer (ViBT), a large-scale instantiation of Brownian Bridge Models designed for efficient conditional generation. ViBT directly models the trajectory between inputs and outputs, creating an efficient data-to-data translation paradigm. The models demonstrate effectiveness for various image and video translation tasks, including instruction-based image editing and complex video translation.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Space using Yuanshi/ViBT 1

Collection including Yuanshi/ViBT