Improve model card for MiniCPM-o 2.6: Add paper link and license metadata

#56
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +17 -15
README.md CHANGED
@@ -1,10 +1,11 @@
1
  ---
2
- pipeline_tag: any-to-any
3
  datasets:
4
  - openbmb/RLAIF-V-Dataset
5
- library_name: transformers
6
  language:
7
  - multilingual
 
 
 
8
  tags:
9
  - minicpm-o
10
  - omni
@@ -22,9 +23,11 @@ tags:
22
  - tts
23
  ---
24
 
25
- <h1>A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone</h1>
26
 
27
- [GitHub](https://github.com/OpenBMB/MiniCPM-o) | [Online Demo](https://minicpm-omni-webdemo-us.modelbest.cn) | [Technical Blog](https://openbmb.notion.site/MiniCPM-o-2-6-A-GPT-4o-Level-MLLM-for-Vision-Speech-and-Multimodal-Live-Streaming-on-Your-Phone-185ede1b7a558042b5d5e45e6b237da9) | [Join Us](https://mp.weixin.qq.com/mp/wappoc_appmsgcaptcha?poc_token=HAV8UWijqB3ImPSXecZHlOns7NRgpQw9y9EI2_fE&target_url=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2FKIhH2nCURBXuFXAtYRpuXg%3F)
 
 
28
 
29
 
30
  ### News
@@ -927,7 +930,6 @@ All results are from AudioEvals, and the evaluation methods along with further d
927
  </table>
928
 
929
 
930
-
931
  ### Examples <!-- omit in toc -->
932
 
933
  We deploy MiniCPM-o 2.6 on end devices. The demo video is the raw-speed recording on an iPad Pro and a Web demo.
@@ -1329,7 +1331,8 @@ For audio-to-text tasks, you can use the following prompts:
1329
  - General Sound Scene Tagging: `Utilize one keyword to convey the audio's content or the associated scene.`
1330
 
1331
  ```python
1332
- task_prompt = "Please listen to the audio snippet carefully and transcribe the content." + "\n" # can change to other prompts.
 
1333
  audio_input, _ = librosa.load('./assets/input_examples/audio_understanding.mp3', sr=16000, mono=True) # load the audio to be captioned
1334
 
1335
  msgs = [{'role': 'user', 'content': [task_prompt, audio_input]}]
@@ -1471,31 +1474,30 @@ Download the int4 quantized version for lower GPU memory (7GB) usage: [MiniCPM-
1471
 
1472
 
1473
  ## License
1474
- #### Model License
1475
  * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
1476
  * The usage of MiniCPM-o and MiniCPM-V series model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
1477
  * The models and weights of MiniCPM are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, MiniCPM-o 2.6 weights are also available for free commercial use.
1478
 
1479
 
1480
  #### Statement
1481
- * As an LMM, MiniCPM-o 2.6 generates contents by learning a large mount of multimodal corpora, but it cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-o 2.6 does not represent the views and positions of the model developers
1482
  * We will not be liable for any problems arising from the use of the MinCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
1483
 
1484
  ## Key Techniques and Other Multimodal Projects
1485
 
1486
- ๐Ÿ‘ Welcome to explore key techniques of MiniCPM-o 2.6 and other multimodal projects of our team:
1487
 
1488
- [VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD) | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)
1489
 
1490
  ## Citation
1491
 
1492
  If you find our work helpful, please consider citing our papers ๐Ÿ“ and liking this project โค๏ธ๏ผ
1493
 
1494
  ```bib
1495
- @article{yao2024minicpm,
1496
- title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
1497
  author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
1498
- journal={arXiv preprint arXiv:2408.01800},
1499
- year={2024}
1500
  }
1501
- ```
 
1
  ---
 
2
  datasets:
3
  - openbmb/RLAIF-V-Dataset
 
4
  language:
5
  - multilingual
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: any-to-any
9
  tags:
10
  - minicpm-o
11
  - omni
 
23
  - tts
24
  ---
25
 
26
+ # A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
27
 
28
+ This model is part of the MiniCPM series, described in the paper [MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe](https://huggingface.co/papers/2509.18154).
29
+
30
+ [GitHub](https://github.com/OpenBMB/MiniCPM-o) | [Online Demo](https://minicpm-omni-webdemo-us.modelbest.cn) | [Technical Blog (MiniCPM-o 2.6)](https://openbmb.notion.site/MiniCPM-o-2-6-A-GPT-4o-Level-MLLM-for-Vision-Speech-and-Multimodal-Live-Streaming-on-Your-Phone-185ede1b7a558042b5d5e45e6b237da9) | [Join Us](https://mp.weixin.qq.com/mp/wappoc_appmsgcaptcha?poc_token=HAV8UWijqB3ImPSXecZHlOns7NRgpQw9y9EI2_fE&target_url=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2FKIhH2nCURBXuFXAtYRpuXg%3F)
31
 
32
 
33
  ### News
 
930
  </table>
931
 
932
 
 
933
  ### Examples <!-- omit in toc -->
934
 
935
  We deploy MiniCPM-o 2.6 on end devices. The demo video is the raw-speed recording on an iPad Pro and a Web demo.
 
1331
  - General Sound Scene Tagging: `Utilize one keyword to convey the audio's content or the associated scene.`
1332
 
1333
  ```python
1334
+ task_prompt = "Please listen to the audio snippet carefully and transcribe the content." + "
1335
+ " # can change to other prompts.
1336
  audio_input, _ = librosa.load('./assets/input_examples/audio_understanding.mp3', sr=16000, mono=True) # load the audio to be captioned
1337
 
1338
  msgs = [{'role': 'user', 'content': [task_prompt, audio_input]}]
 
1474
 
1475
 
1476
  ## License
 
1477
  * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
1478
  * The usage of MiniCPM-o and MiniCPM-V series model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
1479
  * The models and weights of MiniCPM are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, MiniCPM-o 2.6 weights are also available for free commercial use.
1480
 
1481
 
1482
  #### Statement
1483
+ * As an MLLM, MiniCPM-o 2.6 generates contents by learning a large amount of multimodal corpora, but it cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-o 2.6 does not represent the views and positions of the model developers
1484
  * We will not be liable for any problems arising from the use of the MinCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
1485
 
1486
  ## Key Techniques and Other Multimodal Projects
1487
 
1488
+ ๐Ÿ‘ Welcome to explore key techniques of MiniCPM-o/V and other multimodal projects of our team:
1489
 
1490
+ [VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLPR](https://github.com/OpenBMB/RLPR) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD) | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)
1491
 
1492
  ## Citation
1493
 
1494
  If you find our work helpful, please consider citing our papers ๐Ÿ“ and liking this project โค๏ธ๏ผ
1495
 
1496
  ```bib
1497
+ @article{yao2025minicpmv45,
1498
+ title={MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe},
1499
  author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
1500
+ journal={arXiv preprint arXiv:2509.18154},
1501
+ year={2025}
1502
  }
1503
+ ```