Improve model card for MiniCPM-o 2.6: Add paper link and license metadata
#56
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,10 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
pipeline_tag: any-to-any
|
| 3 |
datasets:
|
| 4 |
- openbmb/RLAIF-V-Dataset
|
| 5 |
-
library_name: transformers
|
| 6 |
language:
|
| 7 |
- multilingual
|
|
|
|
|
|
|
|
|
|
| 8 |
tags:
|
| 9 |
- minicpm-o
|
| 10 |
- omni
|
|
@@ -22,9 +23,11 @@ tags:
|
|
| 22 |
- tts
|
| 23 |
---
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
| 28 |
|
| 29 |
|
| 30 |
### News
|
|
@@ -927,7 +930,6 @@ All results are from AudioEvals, and the evaluation methods along with further d
|
|
| 927 |
</table>
|
| 928 |
|
| 929 |
|
| 930 |
-
|
| 931 |
### Examples <!-- omit in toc -->
|
| 932 |
|
| 933 |
We deploy MiniCPM-o 2.6 on end devices. The demo video is the raw-speed recording on an iPad Pro and a Web demo.
|
|
@@ -1329,7 +1331,8 @@ For audio-to-text tasks, you can use the following prompts:
|
|
| 1329 |
- General Sound Scene Tagging: `Utilize one keyword to convey the audio's content or the associated scene.`
|
| 1330 |
|
| 1331 |
```python
|
| 1332 |
-
task_prompt = "Please listen to the audio snippet carefully and transcribe the content." + "
|
|
|
|
| 1333 |
audio_input, _ = librosa.load('./assets/input_examples/audio_understanding.mp3', sr=16000, mono=True) # load the audio to be captioned
|
| 1334 |
|
| 1335 |
msgs = [{'role': 'user', 'content': [task_prompt, audio_input]}]
|
|
@@ -1471,31 +1474,30 @@ Download the int4 quantized version for lower GPU memory (7GB) usage: [MiniCPM-
|
|
| 1471 |
|
| 1472 |
|
| 1473 |
## License
|
| 1474 |
-
#### Model License
|
| 1475 |
* The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
|
| 1476 |
* The usage of MiniCPM-o and MiniCPM-V series model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
|
| 1477 |
* The models and weights of MiniCPM are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, MiniCPM-o 2.6 weights are also available for free commercial use.
|
| 1478 |
|
| 1479 |
|
| 1480 |
#### Statement
|
| 1481 |
-
* As an
|
| 1482 |
* We will not be liable for any problems arising from the use of the MinCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
|
| 1483 |
|
| 1484 |
## Key Techniques and Other Multimodal Projects
|
| 1485 |
|
| 1486 |
-
๐ Welcome to explore key techniques of MiniCPM-o
|
| 1487 |
|
| 1488 |
-
[VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD)
|
| 1489 |
|
| 1490 |
## Citation
|
| 1491 |
|
| 1492 |
If you find our work helpful, please consider citing our papers ๐ and liking this project โค๏ธ๏ผ
|
| 1493 |
|
| 1494 |
```bib
|
| 1495 |
-
@article{
|
| 1496 |
-
title={MiniCPM-V:
|
| 1497 |
author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
|
| 1498 |
-
journal={arXiv preprint arXiv:
|
| 1499 |
-
year={
|
| 1500 |
}
|
| 1501 |
-
```
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
datasets:
|
| 3 |
- openbmb/RLAIF-V-Dataset
|
|
|
|
| 4 |
language:
|
| 5 |
- multilingual
|
| 6 |
+
library_name: transformers
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
pipeline_tag: any-to-any
|
| 9 |
tags:
|
| 10 |
- minicpm-o
|
| 11 |
- omni
|
|
|
|
| 23 |
- tts
|
| 24 |
---
|
| 25 |
|
| 26 |
+
# A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
|
| 27 |
|
| 28 |
+
This model is part of the MiniCPM series, described in the paper [MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe](https://huggingface.co/papers/2509.18154).
|
| 29 |
+
|
| 30 |
+
[GitHub](https://github.com/OpenBMB/MiniCPM-o) | [Online Demo](https://minicpm-omni-webdemo-us.modelbest.cn) | [Technical Blog (MiniCPM-o 2.6)](https://openbmb.notion.site/MiniCPM-o-2-6-A-GPT-4o-Level-MLLM-for-Vision-Speech-and-Multimodal-Live-Streaming-on-Your-Phone-185ede1b7a558042b5d5e45e6b237da9) | [Join Us](https://mp.weixin.qq.com/mp/wappoc_appmsgcaptcha?poc_token=HAV8UWijqB3ImPSXecZHlOns7NRgpQw9y9EI2_fE&target_url=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2FKIhH2nCURBXuFXAtYRpuXg%3F)
|
| 31 |
|
| 32 |
|
| 33 |
### News
|
|
|
|
| 930 |
</table>
|
| 931 |
|
| 932 |
|
|
|
|
| 933 |
### Examples <!-- omit in toc -->
|
| 934 |
|
| 935 |
We deploy MiniCPM-o 2.6 on end devices. The demo video is the raw-speed recording on an iPad Pro and a Web demo.
|
|
|
|
| 1331 |
- General Sound Scene Tagging: `Utilize one keyword to convey the audio's content or the associated scene.`
|
| 1332 |
|
| 1333 |
```python
|
| 1334 |
+
task_prompt = "Please listen to the audio snippet carefully and transcribe the content." + "
|
| 1335 |
+
" # can change to other prompts.
|
| 1336 |
audio_input, _ = librosa.load('./assets/input_examples/audio_understanding.mp3', sr=16000, mono=True) # load the audio to be captioned
|
| 1337 |
|
| 1338 |
msgs = [{'role': 'user', 'content': [task_prompt, audio_input]}]
|
|
|
|
| 1474 |
|
| 1475 |
|
| 1476 |
## License
|
|
|
|
| 1477 |
* The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
|
| 1478 |
* The usage of MiniCPM-o and MiniCPM-V series model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
|
| 1479 |
* The models and weights of MiniCPM are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, MiniCPM-o 2.6 weights are also available for free commercial use.
|
| 1480 |
|
| 1481 |
|
| 1482 |
#### Statement
|
| 1483 |
+
* As an MLLM, MiniCPM-o 2.6 generates contents by learning a large amount of multimodal corpora, but it cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-o 2.6 does not represent the views and positions of the model developers
|
| 1484 |
* We will not be liable for any problems arising from the use of the MinCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
|
| 1485 |
|
| 1486 |
## Key Techniques and Other Multimodal Projects
|
| 1487 |
|
| 1488 |
+
๐ Welcome to explore key techniques of MiniCPM-o/V and other multimodal projects of our team:
|
| 1489 |
|
| 1490 |
+
[VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLPR](https://github.com/OpenBMB/RLPR) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD) | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)
|
| 1491 |
|
| 1492 |
## Citation
|
| 1493 |
|
| 1494 |
If you find our work helpful, please consider citing our papers ๐ and liking this project โค๏ธ๏ผ
|
| 1495 |
|
| 1496 |
```bib
|
| 1497 |
+
@article{yao2025minicpmv45,
|
| 1498 |
+
title={MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe},
|
| 1499 |
author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
|
| 1500 |
+
journal={arXiv preprint arXiv:2509.18154},
|
| 1501 |
+
year={2025}
|
| 1502 |
}
|
| 1503 |
+
```
|