Vintern-1B-v2-ViTable-docvqa

Report Link👁️

Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for the Vietnamese DocVQA (Table data)

Benchmarks

Model ANLS Semantic Similarity MLLM-as-judge (Gemini)
Gemini 1.5 Flash 0.35 0.56 0.40
Vintern-1B-v2 0.04 0.45 0.50
Vintern-1B-v2-ViTable-docvqa 0.50 0.71 0.59

Usage

Check out this 🤗 HF Demo, or you can open it in Colab:
Open In Colab

Citation:

@misc{doan2024vintern1befficientmultimodallarge,
      title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese}, 
      author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
      year={2024},
      eprint={2408.12480},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2408.12480}, 
}
Downloads last month
9
Safetensors
Model size
0.9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YuukiAsuna/Vintern-1B-v2-ViTable-docvqa

Finetuned
(2)
this model

Dataset used to train YuukiAsuna/Vintern-1B-v2-ViTable-docvqa

Space using YuukiAsuna/Vintern-1B-v2-ViTable-docvqa 1

Paper for YuukiAsuna/Vintern-1B-v2-ViTable-docvqa