WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 7 days ago • 229
WildDet3D Collection This is the collection of WildDet3D artifacts, including demos, model checkpoints and data. https://github.com/allenai/WildDet3D • 8 items • Updated 2 days ago • 17
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models Paper • 2603.24575 • Published 21 days ago • 18
Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos Paper • 2602.23543 • Published Feb 26 • 9
TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics Paper • 2602.19313 • Published Feb 22 • 26
XGen-MM-1 models and datasets Collection A collection of all XGen-MM (Foundation LMM) models! • 15 items • Updated Mar 2 • 40
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published Jun 20, 2025 • 64
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index Paper • 2506.12229 • Published Jun 13, 2025 • 3
DocRAG Datasets Collection Processed ("Unified") datasets used in DocRAG for training or inference purposes. • 12 items • Updated Jun 14, 2025 • 1
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs Paper • 2504.15280 • Published Apr 21, 2025 • 25
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated Dec 23, 2025 • 309
Synthetic Object Compositions for Det / Seg / Grounding Collection Dataset Collections for paper: https://github.com/weikaih04/Synthetic-Detection-Segmentation-Grounding-Data • 8 items • Updated Mar 2 • 2
CoTA Datasets Collection This collection contains all versions of the CoTA (Chain-of-Thought-and-Action) datasets. • 4 items • Updated Mar 2 • 7
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 101
TaskMeAnything Collection A collection of TaskMeAnything resources [https://github.com/JieyuZ2/TaskMeAnything] • 7 items • Updated Mar 2 • 3