Improve model card for ReMoMask
Browse filesThis PR updates the model card for **ReMoMask: Retrieval-Augmented Masked Motion Generation** ([Paper](https://huggingface.co/papers/2508.02605)).
It adds the following improvements:
- **Corrected License**: The `license` tag is updated from `cc-by-sa-4.0` to `cc-by-nc-sa-4.0` as specified in the official GitHub repository.
- **Pipeline Tag**: The `pipeline_tag: text-to-3d` is added, enhancing discoverability for Text-to-Motion generation tasks on the Hugging Face Hub.
- **Paper Link**: A direct link to the Hugging Face paper page is included.
- **Project Page**: The official project website is linked for more information.
- **GitHub Repository Link**: A link to the source code repository is provided.
- **Abstract**: The paper's abstract is added to give a comprehensive overview of the model.
- **Visuals**: The demo video and framework image from the GitHub repository are embedded to illustrate the model's output and architecture.
- **Sample Usage**: A code snippet for running a local demo, taken directly from the GitHub README, is included to provide users with a quick start.
- **Citation**: The BibTeX entry is added for proper academic attribution.
Please review and merge if everything looks good.
|
@@ -1,3 +1,53 @@
|
|
| 1 |
---
|
| 2 |
-
license: cc-by-sa-4.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
pipeline_tag: text-to-3d
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# <img src="https://huggingface.co/AIGeeksGroup/ReMoMask/resolve/main/assets/remomask_logo.png" alt="logo" width="30"/> ReMoMask: Retrieval-Augmented Masked Motion Generation
|
| 7 |
+
|
| 8 |
+
This is the official repository for the paper [ReMoMask: Retrieval-Augmented Masked Motion Generation](https://huggingface.co/papers/2508.02605).
|
| 9 |
+
|
| 10 |
+
- 📚 [Paper](https://huggingface.co/papers/2508.02605)
|
| 11 |
+
- 🌐 [Project Page](https://aigeeksgroup.github.io/ReMoMask/)
|
| 12 |
+
- 💻 [Code](https://github.com/AIGeeksGroup/ReMoMask)
|
| 13 |
+
|
| 14 |
+
https://github.com/user-attachments/assets/3f29c0c5-abb8-4fd1-893c-48ac82b79532
|
| 15 |
+
|
| 16 |
+
## Abstract
|
| 17 |
+
|
| 18 |
+
Text-to-Motion (T2M) generation aims to synthesize realistic and semantically aligned human motion sequences from natural language descriptions. However, current approaches face dual challenges: Generative models (e.g., diffusion models) suffer from limited diversity, error accumulation, and physical implausibility, while Retrieval-Augmented Generation (RAG) methods exhibit diffusion inertia, partial-mode collapse, and asynchronous artifacts. To address these limitations, we propose ReMoMask, a unified framework integrating three key innovations: 1) A Bidirectional Momentum Text-Motion Model decouples negative sample scale from batch size via momentum queues, substantially improving cross-modal retrieval precision; 2) A Semantic Spatio-temporal Attention mechanism enforces biomechanical constraints during part-level fusion to eliminate asynchronous artifacts; 3) RAG-Classier-Free Guidance incorporates minor unconditional generation to enhance generalization. Built upon MoMask's RVQ-VAE, ReMoMask efficiently generates temporally coherent motions in minimal steps. Extensive experiments on standard benchmarks demonstrate the state-of-the-art performance of ReMoMask, achieving a 3.88% and 10.97% improvement in FID scores on HumanML3D and KIT-ML, respectively, compared to the previous SOTA method RAG-T2M.
|
| 19 |
+
|
| 20 |
+
## Framework
|
| 21 |
+
|
| 22 |
+
An overview of the ReMoMask framework:
|
| 23 |
+
|
| 24 |
+

|
| 25 |
+
|
| 26 |
+
## Sample Usage
|
| 27 |
+
|
| 28 |
+
To run a local demo for motion generation, you can use the provided `demo.py` script from the GitHub repository.
|
| 29 |
+
|
| 30 |
+
First, ensure you have the environment set up as described in the [GitHub repository's Prerequisite section](https://github.com/AIGeeksGroup/ReMoMask#prerequisite).
|
| 31 |
+
|
| 32 |
+
Then, run the demo with a text prompt:
|
| 33 |
+
```bash
|
| 34 |
+
python demo.py --gpu_id 0 --ext exp1 --text_prompt "A person is walking on a circle." --checkpoints_dir logs --dataset_name humanml3d --mtrans_name pretrain_mtrans --rtrans_name pretrain_rtrans
|
| 35 |
+
# change pretrain_mtrans and pretrain_rtrans to your mtrans and rtrans after your training done
|
| 36 |
+
```
|
| 37 |
+
- `--repeat_times`: number of replications for generation, default `1`.
|
| 38 |
+
- `--motion_length`: specify the number of poses for generation.
|
| 39 |
+
|
| 40 |
+
The output will be saved in `./outputs/exp1/`.
|
| 41 |
+
|
| 42 |
+
## Citation
|
| 43 |
+
|
| 44 |
+
If you find our work helpful or inspiring, please feel free to cite it.
|
| 45 |
+
|
| 46 |
+
```bibtex
|
| 47 |
+
@article{li2025remomask,
|
| 48 |
+
title={ReMoMask: Retrieval-Augmented Masked Motion Generation},
|
| 49 |
+
author={Li, Zhengdao and Wang, Siheng and Zhang, Zeyu and Tang, Hao},
|
| 50 |
+
journal={arXiv preprint arXiv:2508.02605},
|
| 51 |
+
year={2025}
|
| 52 |
+
}
|
| 53 |
+
```
|