Enhance model card for GRANT with metadata, links, authors, and installation guide

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +101 -3
README.md CHANGED
@@ -1,3 +1,101 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
+ datasets:
6
+ - H-EmbodVis/ORS3D-60K
7
+ base_model:
8
+ - Jiayi-Pan/Tiny-Vicuna-1B
9
+ ---
10
+
11
+ # GRANT: Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution
12
+
13
+ This repository contains **GRANT**, an embodied multi-modal large language model for Operations Research knowledge-based 3D Grounded Task Scheduling (ORS3D), presented in the paper [Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution](https://huggingface.co/papers/2511.19430).
14
+
15
+ **Authors:** Dingkang Liang, Cheng Zhang, Xiaopeng Xu, Jianzhong Ju, Zhenbo Luo, Xiang Bai
16
+
17
+ - [\ud83d\udcda Paper](https://huggingface.co/papers/2511.19430)
18
+ - [\ud83c\udf10 Project Page](https://h-embodvis.github.io/GRANT)
19
+ - [\ud83d\udcbb Code on GitHub](https://github.com/H-EmbodVis/GRANT)
20
+ - [\ud83d\udcca Dataset (ORS3D-60K)](https://huggingface.co/datasets/H-EmbodVis/ORS3D-60K)
21
+
22
+ <div align="center">
23
+ <img src="https://huggingface.co/H-EmbodVis/GRANT/resolve/main/figures/teaser.png" width="888" alt="GRANT Teaser Image"/>
24
+ </div>
25
+
26
+ ## Abstract
27
+ Task scheduling is critical for embodied AI, enabling agents to follow natural language instructions and execute actions efficiently in 3D physical worlds. However, existing datasets often simplify task planning by ignoring operations research (OR) knowledge and 3D spatial grounding. In this work, we propose Operations Research knowledge-based 3D Grounded Task Scheduling (ORS3D), a new task that requires the synergy of language understanding, 3D grounding, and efficiency optimization. Unlike prior settings, ORS3D demands that agents minimize total completion time by leveraging parallelizable subtasks, e.g., cleaning the sink while the microwave operates. To facilitate research on ORS3D, we construct ORS3D-60K, a large-scale dataset comprising 60K composite tasks across 4K real-world scenes. Furthermore, we propose GRANT, an embodied multi-modal large language model equipped with a simple yet effective scheduling token mechanism to generate efficient task schedules and grounded actions. Extensive experiments on ORS3D-60K validate the effectiveness of GRANT across language understanding, 3D grounding, and scheduling efficiency.
28
+
29
+ ## Installation & Data Preparation
30
+ This project is built upon [Grounded 3D-LLM](https://github.com/OpenRobotLab/Grounded_3D-LLM), and the preparations roughly follow the Grounded 3D-LLM.
31
+
32
+ ### Environment Setup
33
+
34
+ Python: `3.10.16`
35
+ Pytorch: `1.12.1+cu116`
36
+ CUDA: 11.6
37
+
38
+ ```bash
39
+ conda create -n GRANT python=3.10.16
40
+ conda activate GRANT
41
+
42
+ conda install openblas-devel -c anaconda
43
+ conda install openjdk=11
44
+
45
+ pip install -r requirements.txt
46
+
47
+ export LD_LIBRARY_PATH=/mnt/petrelfs/share/gcc/mpc-0.8.1/lib:/mnt/petrelfs/share/gcc/mpfr-2.4.2/lib:/mnt/petrelfs/share/gcc/gmp-4.3.2/lib:/mnt/petrelfs/share/gcc/gcc-9.4.0/lib64:$LD_LIBRARY_PATH
48
+ # Note: The above path is for a specific cluster environment. Please update it according to your system configuration.
49
+
50
+ pip3 install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
51
+ pip3 install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cu116.html
52
+ pip install peft==0.8.2 --no-deps # ignore the pytorch version error
53
+
54
+ mkdir -p third_party
55
+ cd third_party
56
+ git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine"
57
+ cd MinkowskiEngine
58
+ git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
59
+ python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas
60
+
61
+ cd ../pointnet2
62
+ python setup.py install
63
+ ```
64
+
65
+ ### Data Preparation
66
+
67
+ Download ORS3D-60K dataset and dataset splits from [HuggingFace](https://huggingface.co/datasets/H-EmbodVis/ORS3D-60K).
68
+ Download 3D scenes from [SceneVerse](https://github.com/scene-verse/SceneVerse/blob/main/DATA.md).
69
+ ```
70
+ GRANT
71
+ β”œβ”€β”€ data
72
+ β”‚ β”œβ”€β”€ langdata
73
+ β”‚ β”‚ │── ORS3D.json # ORS3D-60K dataset
74
+ β”‚ │── SceneVerse
75
+ β”‚ β”‚ │── 3RScan
76
+ β”‚ β”‚ │── ARKitScenes
77
+ β”‚ β”‚ │── HM3D
78
+ β”‚ β”‚ │── MultiScan
79
+ β”‚ β”‚ │── ScanNet
80
+ β”‚ β”‚ │── splits # ORS3D-60K dataset splits
81
+ ```
82
+
83
+ ### Pretrained weights
84
+
85
+ #### 1. Download the pretrained LLM weights
86
+ Please download the pretrained LLM weights ([Tiny-Vicuna-1B](https://huggingface.co/Jiayi-Pan/Tiny-Vicuna-1B)) and store them in `$ROOT_PATH/pretrained/llm_weight/Tiny-Vicuna-1B/`
87
+
88
+ #### 2. Download the model weights
89
+ Download the point cloud encoder weights and pretrained GRANT weights from [HuggingFace](https://huggingface.co/H-EmbodVis/GRANT).
90
+
91
+ ## Citation
92
+
93
+ If you find this repository useful in your research, please consider giving a star ⭐ and a citation.
94
+ ```bibtex
95
+ @inproceedings{liang2026cook,
96
+ title={Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution},
97
+ author={Liang, Dingkang and Zhang, Cheng and Xu, Xiaopeng and Ju, Jianzhong and Luo, Zhenbo and Bai, Xiang},
98
+ booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
99
+ year={2026}
100
+ }
101
+ ```