---
license: apache-2.0
---
# Casktalk-VLM ( CaskTalk Vision Language Model)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/63aac6866ed74f08b7af372b/XWjnl3ssNQmiG2XGlBb3X.png)

## Model Details

- **Developed by:** ToriLab  (CasTalk)
- **Model type:** (based on LLaVA, +  mistral-7b)

## Usage
### Presequities
```bash
pip install --upgrade pip
pip install transformers>=4.39.0
```


### Inference 

```python
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
processor = LlavaNextProcessor.from_pretrained("torilab/casktalk-vlm-v1.0")
model = LlavaNextForConditionalGeneration.from_pretrained(
    "torilab/casktalk-vlm-v1.0",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True
)
model.to(device)

```

We now pass the image and the text prompt to the processor, and then pass the processed inputs to the generate.
```python
from PIL import Image
import requests

url = "<your_user_image>"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "[INST] <image>\nWhat is shown in this image? [/INST]"

inputs = processor(prompt, image, return_tensors="pt").to(device)
output = model.generate(**inputs, max_new_tokens=100)
```
Call decode to decode the output tokens.

```python
print(processor.decode(output[0], skip_special_tokens=True))
```

## About -ToriLab
ToriLab builds reliable, practical, and scalable AI solutions for the CasTalk app.
- phuongdv-VN (fenix@castalk.com)
- khanhvu-VN (kevin@castalk.com)
- hieptran-VN (halley@castalk.com)
- tanaka-JP (daniel@castalk.com)