YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Video Generation Endpoint API (Universal Handler)

This repository is configured for deployment as a Hugging Face Inference Endpoint using a Universal Custom Handler. It supports both Text-to-Video and Image-to-Video workflows, returning results as GIF, WebM, or raw Frames (ZIP).

Endpoint URL

After deployment, your endpoint will look like:

https://<your-endpoint>.aws.endpoints.huggingface.cloud

Authentication

All requests require a Hugging Face token with permission to call the endpoint.

Authorization: Bearer YOUR_HF_TOKEN

Request Format

Requests must be wrapped in a top-level inputs object.

1. Text-to-Video (T2V)

{
  "inputs": {
    "prompt": "cinematic drone shot of a futuristic city",
    "num_frames": 32,
    "outputs": ["gif"]
  }
}

2. Image-to-Video (I2V)

To animate a static image, provide a base64-encoded string in the image field.

{
  "inputs": {
    "prompt": "slow zoom in, volumetric fog",
    "image": "BASE64_STRING_HERE",
    "num_frames": 32,
    "outputs": ["gif"]
  }
}

API Parameters

Field	Type	Default	Description
`prompt`	string	required	Description of the video or motion.
`image`	string	`null`	(New) Base64-encoded input image for I2V.
`negative_prompt`	string	`""`	Elements to avoid in the generation.
`num_frames`	int	`32`	Total frames to generate.
`fps`	int	`12`	Playback frame rate.
`height`	int	`512`	Video height (must be divisible by 32).
`width`	int	`512`	Video width (must be divisible by 32).
`seed`	int	`null`	Random seed for reproducibility.
`num_inference_steps`	int	`30`	Higher = better quality, slower generation.
`guidance_scale`	float	`7.5`	How strictly to follow the prompt.
`outputs`	array	`["gif"]`	Output formats: `["gif", "webm", "zip"]`.
`return_base64`	bool	`true`	Returns file content as base64 string.

Output Configuration

You can customize specific output formats by adding a matching key to inputs.

"inputs": {
  "outputs": ["webm"],
  "webm": { 
    "quality": "best", 
    "fps": 24 
  }
}

Gif: { "fps": int }
WebM: { "fps": int, "quality": "fast"|"good"|"best" }

Response Format

Success response:

{
  "ok": true,
  "outputs": {
    "gif_base64": "R0lGODlh...", 
    "webm_base64": "..."
  },
  "diagnostics": {
    "timing_ms": { ... },
    "mode": "i2v"  // or "t2v"
  }
}

usage Examples (curl)

1. Simple Text-to-Video (GIF)

curl -sS -X POST "https://<ENDPOINT>.aws.endpoints.huggingface.cloud" \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "prompt": "a cyberpunk street in rain, neon lights",
      "num_frames": 24,
      "outputs": ["gif"]
    }
  }' \
| jq -er '.outputs.gif_base64' | base64 --decode > output.gif

2. Image-to-Video (GIF)

# MacOS/Linux: Convert image to base64
IMG_B64=$(base64 -i my_input.jpg)

curl -sS -X POST "https://<ENDPOINT>.aws.endpoints.huggingface.cloud" \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Content-Type: application/json" \
  -d "{
    \"inputs\": {
      \"prompt\": \"waves crashing on the shore, moving water\",
      \"image\": \"$IMG_B64\",
      \"num_frames\": 32,
      \"outputs\": [\"gif\"]
    }
  }" \
| jq -er '.outputs.gif_base64' | base64 --decode > animated.gif

3. High-Quality Video (WebM)

curl -sS -X POST "https://<ENDPOINT>.aws.endpoints.huggingface.cloud" \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "prompt": "slow pan over a mars landscape",
      "num_frames": 48,
      "fps": 24,
      "outputs": ["webm"],
      "webm": { "quality": "best" }
    }
  }' \
| jq -er '.outputs.webm_base64' | base64 --decode > output.webm

Downloads last month: 486

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support