How can I disable streaming during inference? Also, can Deepseek OCR handle multiple images at the same time?

#91

by Tizzzzy - opened Nov 21, 2025

Nov 21, 2025

How can I disable streaming during inference? Also, can Deepseek OCR handle multiple images at the same time?

Here is my current code:

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(
        model_name, 
        _attn_implementation='flash_attention_2', 
        trust_remote_code=True, 
        use_safetensors=True,
    )

res = model.infer(
            tokenizer, 
            prompt=prompt, 
            image_file=image_file, 
            output_path=output_path, 
            base_size=1024, 
            image_size=640, 
            crop_mode=True,   # Gundam mode (Dynamic Tiling) - Good for docs
            save_results=True, 
            test_compress=True,
            eval_mode=True
        )

sameershaik78

Jan 29

Yes, DeepSeek OCR can handle multiple images in parallel using the vLLM inference method. And to stop streaming inference output, you need to mention eval_mode = True, and I see you already have that in your code.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment