Med-2E3-M3D

Introduction

A 3D medical LVLM, Med-2E3, trained on 3D CT volumes and English medical texts (M3D-Cap & M3D-VQA), enabling tasks such as report generation and medical VQA.

	Config
3D Image encoder	GoodBaiBai88/M3D-CLIP
2D Image encoder	google/siglip-large-patch16-256
Connector	TG-IS scoring module
LLM	Qwen/Qwen2.5-3B-Instruct
Image resolution	32256256
Sequence length	768

Quickstart

Please refer to Med-2E3.

Citation

@article{shi2024med,
  title={Med-2e3: A 2d-enhanced 3d medical multimodal large language model},
  author={Shi, Yiming and Zhu, Xun and Wang, Kaiwen and Hu, Ying and Guo, Chenyi and Li, Miao and Wu, Ji},
  journal={arXiv preprint arXiv:2411.12783},
  year={2024}
}

Downloads last month: 10

Model tree for shiym2000/Med-2E3-M3D

Base model

GoodBaiBai88/M3D-CLIP

Finetuned

(1)

this model

Datasets used to train shiym2000/Med-2E3-M3D

Paper for shiym2000/Med-2E3-M3D

Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model

Paper • 2411.12783 • Published Nov 19, 2024