Phronetic Owlet Series
Collection
A collection of specialised finetuned models with multimodal capabilities. β’ 4 items β’ Updated
phronetic-ai/owlet-safety-3b-1 is a fine-tuned version of Qwen2.5-VL-3B-Instruct for multi-label safety event detection in video clips.
This model can identify safety-related activities like:
fire, smoke, fall, assault, sos, theft, or none (if no concern is found).It is suitable for video surveillance, incident detection, and safety monitoring tasks where multiple events may occur simultaneously.
assault, fall, fire, smoke, sos, theft, noneTo run this model effectively, the following hardware and memory configurations are recommended:
You'll need:
pip install transformers accelerate
import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from qwen_vl_utils import process_vision_info # custom helper
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForImageTextToText.from_pretrained(
"phronetic-ai/owlet-safety-3b-1",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto"
)
processor = AutoProcessor.from_pretrained("phronetic-ai/owlet-safety-3b-1")
messages = [
{
"role": "system",
"content": "You are an expert at analyzing safety-related activities. Given a video, identify all the safety concerns present. Respond with a comma-separated list of labels from this set: assault, fall, fire, smoke, sos, theft, none. If no safety concerns are present, respond with 'none'."
},
{
"role": "user",
"content": [
{
"type": "video",
"video": "/path/to/video/fire_0.mp4", # π Change to your video path
"max_pixels": 360 * 420,
"fps": 1.0
},
{
"type": "text",
"text": "Identify safety concerns in this video"
}
]
}
]
# Format inputs
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt"
).to(device)
# Inference
torch.cuda.empty_cache()
with torch.no_grad():
generated_ids = model.generate(**inputs, max_new_tokens=128)
# Decode output
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(output_text)
β Example Output:
['fire, smoke']