--- license: apache-2.0 --- # Casktalk-VLM ( CaskTalk Vision Language Model) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63aac6866ed74f08b7af372b/XWjnl3ssNQmiG2XGlBb3X.png) ## Model Details - **Developed by:** ToriLab (CasTalk) - **Model type:** (based on LLaVA, + mistral-7b) ## Usage ### Presequities ```bash pip install --upgrade pip pip install transformers>=4.39.0 ``` ### Inference ```python from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration import torch device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') processor = LlavaNextProcessor.from_pretrained("torilab/casktalk-vlm-v1.0") model = LlavaNextForConditionalGeneration.from_pretrained( "torilab/casktalk-vlm-v1.0", torch_dtype=torch.float16, low_cpu_mem_usage=True ) model.to(device) ``` We now pass the image and the text prompt to the processor, and then pass the processed inputs to the generate. ```python from PIL import Image import requests url = "" image = Image.open(requests.get(url, stream=True).raw) prompt = "[INST] \nWhat is shown in this image? [/INST]" inputs = processor(prompt, image, return_tensors="pt").to(device) output = model.generate(**inputs, max_new_tokens=100) ``` Call decode to decode the output tokens. ```python print(processor.decode(output[0], skip_special_tokens=True)) ``` ## About -ToriLab ToriLab builds reliable, practical, and scalable AI solutions for the CasTalk app. - phuongdv-VN (fenix@castalk.com) - khanhvu-VN (kevin@castalk.com) - hieptran-VN (halley@castalk.com) - tanaka-JP (daniel@castalk.com)