H-EmbodVis
/

GRANT

Model card Files Files and versions

xet

Community

Enhance model card for GRANT with metadata, links, authors, and installation guide

by nielsr HF Staff - opened Nov 25, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+101

-3

Files changed (1) hide show

README.md +101 -3

README.md CHANGED Viewed

@@ -1,3 +1,101 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+pipeline_tag: image-text-to-text
+library_name: transformers
+datasets:
+- H-EmbodVis/ORS3D-60K
+base_model:
+- Jiayi-Pan/Tiny-Vicuna-1B
+---
+# GRANT: Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution
+This repository contains **GRANT**, an embodied multi-modal large language model for Operations Research knowledge-based 3D Grounded Task Scheduling (ORS3D), presented in the paper [Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution](https://huggingface.co/papers/2511.19430).
+**Authors:** Dingkang Liang, Cheng Zhang, Xiaopeng Xu, Jianzhong Ju, Zhenbo Luo, Xiang Bai
+-   [\ud83d\udcda Paper](https://huggingface.co/papers/2511.19430)
+-   [\ud83c\udf10 Project Page](https://h-embodvis.github.io/GRANT)
+-   [\ud83d\udcbb Code on GitHub](https://github.com/H-EmbodVis/GRANT)
+-   [\ud83d\udcca Dataset (ORS3D-60K)](https://huggingface.co/datasets/H-EmbodVis/ORS3D-60K)
+<div align="center">
+ <img src="https://huggingface.co/H-EmbodVis/GRANT/resolve/main/figures/teaser.png" width="888" alt="GRANT Teaser Image"/>
+</div>
+## Abstract
+Task scheduling is critical for embodied AI, enabling agents to follow natural language instructions and execute actions efficiently in 3D physical worlds. However, existing datasets often simplify task planning by ignoring operations research (OR) knowledge and 3D spatial grounding. In this work, we propose Operations Research knowledge-based 3D Grounded Task Scheduling (ORS3D), a new task that requires the synergy of language understanding, 3D grounding, and efficiency optimization. Unlike prior settings, ORS3D demands that agents minimize total completion time by leveraging parallelizable subtasks, e.g., cleaning the sink while the microwave operates. To facilitate research on ORS3D, we construct ORS3D-60K, a large-scale dataset comprising 60K composite tasks across 4K real-world scenes. Furthermore, we propose GRANT, an embodied multi-modal large language model equipped with a simple yet effective scheduling token mechanism to generate efficient task schedules and grounded actions. Extensive experiments on ORS3D-60K validate the effectiveness of GRANT across language understanding, 3D grounding, and scheduling efficiency.
+## Installation & Data Preparation
+This project is built upon [Grounded 3D-LLM](https://github.com/OpenRobotLab/Grounded_3D-LLM), and the preparations roughly follow the Grounded 3D-LLM.
+### Environment Setup
+Python: `3.10.16`
+Pytorch: `1.12.1+cu116`
+CUDA: 11.6
+```bash
+conda create -n GRANT python=3.10.16
+conda activate GRANT
+conda install openblas-devel -c anaconda
+conda install openjdk=11
+pip install -r requirements.txt
+export LD_LIBRARY_PATH=/mnt/petrelfs/share/gcc/mpc-0.8.1/lib:/mnt/petrelfs/share/gcc/mpfr-2.4.2/lib:/mnt/petrelfs/share/gcc/gmp-4.3.2/lib:/mnt/petrelfs/share/gcc/gcc-9.4.0/lib64:$LD_LIBRARY_PATH
+# Note: The above path is for a specific cluster environment. Please update it according to your system configuration.
+pip3 install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
+pip3 install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cu116.html
+pip install peft==0.8.2 --no-deps # ignore the pytorch version error
+mkdir -p third_party
+cd third_party
+git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine"
+cd MinkowskiEngine
+git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
+python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas
+cd ../pointnet2
+python setup.py install
+```
+### Data Preparation
+Download ORS3D-60K dataset and dataset splits from [HuggingFace](https://huggingface.co/datasets/H-EmbodVis/ORS3D-60K).
+Download 3D scenes from [SceneVerse](https://github.com/scene-verse/SceneVerse/blob/main/DATA.md).
+```
+GRANT
+├── data
+│   ├── langdata
+│   │   │── ORS3D.json # ORS3D-60K dataset
+│   │── SceneVerse
+│   │   │── 3RScan
+│   │   │── ARKitScenes
+│   │   │── HM3D
+│   │   │── MultiScan
+│   │   │── ScanNet
+│   │   │── splits # ORS3D-60K dataset splits
+```
+### Pretrained weights
+#### 1. Download the pretrained LLM weights
+Please download the pretrained LLM weights ([Tiny-Vicuna-1B](https://huggingface.co/Jiayi-Pan/Tiny-Vicuna-1B)) and store them in `$ROOT_PATH/pretrained/llm_weight/Tiny-Vicuna-1B/`
+#### 2. Download the model weights
+Download the point cloud encoder weights and pretrained GRANT weights from [HuggingFace](https://huggingface.co/H-EmbodVis/GRANT).
+## Citation
+If you find this repository useful in your research, please consider giving a star ⭐ and a citation.
+```bibtex
+@inproceedings{liang2026cook,
+  title={Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution},
+  author={Liang, Dingkang and Zhang, Cheng and Xu, Xiaopeng and Ju, Jianzhong and Luo, Zhenbo and Bai, Xiang},
+  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
+  year={2026}
+}
+```