Instructions to use tsunghanwu/reverse_llava_v15 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tsunghanwu/reverse_llava_v15 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="tsunghanwu/reverse_llava_v15")# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("tsunghanwu/reverse_llava_v15", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tsunghanwu/reverse_llava_v15 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tsunghanwu/reverse_llava_v15" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tsunghanwu/reverse_llava_v15", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/tsunghanwu/reverse_llava_v15
- SGLang
How to use tsunghanwu/reverse_llava_v15 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tsunghanwu/reverse_llava_v15" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tsunghanwu/reverse_llava_v15", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tsunghanwu/reverse_llava_v15" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tsunghanwu/reverse_llava_v15", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use tsunghanwu/reverse_llava_v15 with Docker Model Runner:
docker model run hf.co/tsunghanwu/reverse_llava_v15
REVERSE-v1.5-7B
Model Summary
REVERSE-v1.5-7B is a novel open-source vision-language model (VLM) that performs both next-token predictioin and self-verification / self-correction during the generation process. Built on top of LLaVA-v1.5-7B, it is fine-tuned using the REVERSE Visual Instruct 1.3M dataset and equipped with a retrospective resampling mechanism that allows it to detect and correct hallucinations during generation. The model is trained in early March, 2025.
Performance
REVERSE achieves state-of-the-art hallucination reduction across a wide range of captioning and open-ended visual question answering benchmarks:
| Benchmark | Metric | Best Baseline | REVERSE (Ο=0.003) | REVERSE (Ο=0.0003) |
|---|---|---|---|---|
| CHAIR-MSCOCO | CHAIR (β) | HA-DPO (11.0) | 10.3 | 6.1 |
| CHAIRs (β) | EOS (38.2) | 37.0 | 13.6 | |
| AMBER-G | Hallucination (β) | EOS (5.1) | 6.0 | 4.0 |
| Coverage (β) | HALVA (53.0) | 52.2 | 26.9 | |
| MMHal-Bench | Score (β) | DoLA (2.33) | 2.56 | 3.28 |
| Hallucination Rate (β) | HACL (0.50) | 0.47 | 0.30 | |
| HaloQuest | Avg. Accuracy (β) | HALVA (23.9) | 30.7 | 32.3 |
| False Premise Acc. (β) | HALVA (21.1) | 31.8 | 29.4 | |
| Visual Challenging Acc. (β) | DoLA (40.1) | 31.5 | 18.7 | |
| Insufficient Context Acc. (β) | HALVA (10.7) | 26.9 | 58.8 |
It also performs competitively on discriminative tasks compared with the base VLM.
| Benchmark | Metric | LLaVA-v1.5-7B | REVERSE (Ο=0.5) |
|---|---|---|---|
| AMBER-D | F1 Score (β) | 74.7 | 74.2 |
| POPE | F1 Score (β) | 85.9 | 85.9 |
| MME-Hall | Score (β) | 648.3 | 601.6 |
Usage
Please refer to the installation guide on GitHub to get started:
π Installation Guide
Additional Resources
- π Project Page: https://reverse-vlm.github.io/
- π§Ύ Dataset: REVERSE Visual Instruct 1.3M
- π§ Ask Questions: GitHub Issues
Intended Use
Primary Use Cases:
- Reducing hallucination in image captioning and VQA tasks
- Benchmarking hallucination-aware generation
- Research on grounded vision-language generation and self-correction
Target Users:
Researchers, developers, and students working in computer vision, NLP, and multimodal AI.
- Downloads last month
- 8
Model tree for tsunghanwu/reverse_llava_v15
Base model
lmsys/vicuna-7b-v1.5