Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model
Paper
•
2411.12783
•
Published
A 3D medical LVLM, Med-2E3, trained on 3D CT volumes and English medical texts (M3D-Cap & M3D-VQA), enabling tasks such as report generation and medical VQA.
| Config | |
|---|---|
| 3D Image encoder | GoodBaiBai88/M3D-CLIP |
| 2D Image encoder | google/siglip-large-patch16-256 |
| Connector | TG-IS scoring module |
| LLM | Qwen/Qwen2.5-3B-Instruct |
| Image resolution | 32*256*256 |
| Sequence length | 768 |
Please refer to Med-2E3.
@article{shi2024med,
title={Med-2e3: A 2d-enhanced 3d medical multimodal large language model},
author={Shi, Yiming and Zhu, Xun and Wang, Kaiwen and Hu, Ying and Guo, Chenyi and Li, Miao and Wu, Ji},
journal={arXiv preprint arXiv:2411.12783},
year={2024}
}
Base model
GoodBaiBai88/M3D-CLIP