openbmb
/

MiniCPM-o-2_6

Model card Files Files and versions

xet

Community

Improve model card for MiniCPM-o 2.6: Add paper link and license metadata

#56

by nielsr HF Staff - opened Sep 28

base: refs/heads/main

←

from: refs/pr/56

Discussion Files changed

+17

-15

Files changed (1) hide show

README.md +17 -15

README.md CHANGED Viewed

@@ -1,10 +1,11 @@
 ---
-pipeline_tag: any-to-any
 datasets:
 - openbmb/RLAIF-V-Dataset
-library_name: transformers
 language:
 - multilingual
 tags:
 - minicpm-o
 - omni
@@ -22,9 +23,11 @@ tags:
 - tts
 ---
-<h1>A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone</h1>
-[GitHub](https://github.com/OpenBMB/MiniCPM-o) | [Online Demo](https://minicpm-omni-webdemo-us.modelbest.cn) | [Technical Blog](https://openbmb.notion.site/MiniCPM-o-2-6-A-GPT-4o-Level-MLLM-for-Vision-Speech-and-Multimodal-Live-Streaming-on-Your-Phone-185ede1b7a558042b5d5e45e6b237da9) | [Join Us](https://mp.weixin.qq.com/mp/wappoc_appmsgcaptcha?poc_token=HAV8UWijqB3ImPSXecZHlOns7NRgpQw9y9EI2_fE&target_url=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2FKIhH2nCURBXuFXAtYRpuXg%3F)
 ### News
@@ -927,7 +930,6 @@ All results are from AudioEvals, and the evaluation methods along with further d
 </table>
 ### Examples <!-- omit in toc -->
 We deploy MiniCPM-o 2.6 on end devices. The demo video is the raw-speed recording on an iPad Pro and a Web demo.
@@ -1329,7 +1331,8 @@ For audio-to-text tasks, you can use the following prompts:
 - General Sound Scene Tagging: `Utilize one keyword to convey the audio's content or the associated scene.`
 ```python
-task_prompt = "Please listen to the audio snippet carefully and transcribe the content." + "\n" # can change to other prompts.
 audio_input, _ = librosa.load('./assets/input_examples/audio_understanding.mp3', sr=16000, mono=True) # load the audio to be captioned
 msgs = [{'role': 'user', 'content': [task_prompt, audio_input]}]
@@ -1471,31 +1474,30 @@ Download the int4 quantized version for lower GPU memory (7GB) usage:  [MiniCPM-
 ## License
-#### Model License
 * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
 * The usage of MiniCPM-o and MiniCPM-V series model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
 * The models and weights of MiniCPM are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, MiniCPM-o 2.6 weights are also available for free commercial use.
 #### Statement
-* As an LMM, MiniCPM-o 2.6 generates contents by learning a large mount of multimodal corpora, but it cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-o 2.6 does not represent the views and positions of the model developers
 * We will not be liable for any problems arising from the use of the MinCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
 ## Key Techniques and Other Multimodal Projects
-👏 Welcome to explore key techniques of MiniCPM-o 2.6 and other multimodal projects of our team:
-[VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD)  | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)
 ## Citation
 If you find our work helpful, please consider citing our papers 📝 and liking this project ❤️！
 ```bib
-@article{yao2024minicpm,
-  title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
   author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
-  journal={arXiv preprint arXiv:2408.01800},
-  year={2024}
 }
-```

 ---
 datasets:
 - openbmb/RLAIF-V-Dataset
 language:
 - multilingual
+library_name: transformers
+license: apache-2.0
+pipeline_tag: any-to-any
 tags:
 - minicpm-o
 - omni
 - tts
 ---
+# A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
+This model is part of the MiniCPM series, described in the paper [MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe](https://huggingface.co/papers/2509.18154).
+[GitHub](https://github.com/OpenBMB/MiniCPM-o) | [Online Demo](https://minicpm-omni-webdemo-us.modelbest.cn) | [Technical Blog (MiniCPM-o 2.6)](https://openbmb.notion.site/MiniCPM-o-2-6-A-GPT-4o-Level-MLLM-for-Vision-Speech-and-Multimodal-Live-Streaming-on-Your-Phone-185ede1b7a558042b5d5e45e6b237da9) | [Join Us](https://mp.weixin.qq.com/mp/wappoc_appmsgcaptcha?poc_token=HAV8UWijqB3ImPSXecZHlOns7NRgpQw9y9EI2_fE&target_url=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2FKIhH2nCURBXuFXAtYRpuXg%3F)
 ### News
 </table>
 ### Examples <!-- omit in toc -->
 We deploy MiniCPM-o 2.6 on end devices. The demo video is the raw-speed recording on an iPad Pro and a Web demo.
 - General Sound Scene Tagging: `Utilize one keyword to convey the audio's content or the associated scene.`
 ```python
+task_prompt = "Please listen to the audio snippet carefully and transcribe the content." + "
+" # can change to other prompts.
 audio_input, _ = librosa.load('./assets/input_examples/audio_understanding.mp3', sr=16000, mono=True) # load the audio to be captioned
 msgs = [{'role': 'user', 'content': [task_prompt, audio_input]}]
 ## License
 * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
 * The usage of MiniCPM-o and MiniCPM-V series model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
 * The models and weights of MiniCPM are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, MiniCPM-o 2.6 weights are also available for free commercial use.
 #### Statement
+* As an MLLM, MiniCPM-o 2.6 generates contents by learning a large amount of multimodal corpora, but it cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-o 2.6 does not represent the views and positions of the model developers
 * We will not be liable for any problems arising from the use of the MinCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
 ## Key Techniques and Other Multimodal Projects
+👏 Welcome to explore key techniques of MiniCPM-o/V and other multimodal projects of our team:
+[VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLPR](https://github.com/OpenBMB/RLPR) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD) | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)
 ## Citation
 If you find our work helpful, please consider citing our papers 📝 and liking this project ❤️！
 ```bib
+@article{yao2025minicpmv45,
+  title={MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe},
   author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
+  journal={arXiv preprint arXiv:2509.18154},
+  year={2025}
 }
+```