YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Video Generation Endpoint API (Universal Handler)
This repository is configured for deployment as a Hugging Face Inference Endpoint using a Universal Custom Handler. It supports both Text-to-Video and Image-to-Video workflows, returning results as GIF, WebM, or raw Frames (ZIP).
Endpoint URL
After deployment, your endpoint will look like:
https://<your-endpoint>.aws.endpoints.huggingface.cloud
Authentication
All requests require a Hugging Face token with permission to call the endpoint.
Authorization: Bearer YOUR_HF_TOKEN
Request Format
Requests must be wrapped in a top-level inputs object.
1. Text-to-Video (T2V)
{
"inputs": {
"prompt": "cinematic drone shot of a futuristic city",
"num_frames": 32,
"outputs": ["gif"]
}
}
2. Image-to-Video (I2V)
To animate a static image, provide a base64-encoded string in the image field.
{
"inputs": {
"prompt": "slow zoom in, volumetric fog",
"image": "BASE64_STRING_HERE",
"num_frames": 32,
"outputs": ["gif"]
}
}
API Parameters
| Field | Type | Default | Description |
|---|---|---|---|
prompt |
string | required | Description of the video or motion. |
image |
string | null |
(New) Base64-encoded input image for I2V. |
negative_prompt |
string | "" |
Elements to avoid in the generation. |
num_frames |
int | 32 |
Total frames to generate. |
fps |
int | 12 |
Playback frame rate. |
height |
int | 512 |
Video height (must be divisible by 32). |
width |
int | 512 |
Video width (must be divisible by 32). |
seed |
int | null |
Random seed for reproducibility. |
num_inference_steps |
int | 30 |
Higher = better quality, slower generation. |
guidance_scale |
float | 7.5 |
How strictly to follow the prompt. |
outputs |
array | ["gif"] |
Output formats: ["gif", "webm", "zip"]. |
return_base64 |
bool | true |
Returns file content as base64 string. |
Output Configuration
You can customize specific output formats by adding a matching key to inputs.
"inputs": {
"outputs": ["webm"],
"webm": {
"quality": "best",
"fps": 24
}
}
- Gif:
{ "fps": int } - WebM:
{ "fps": int, "quality": "fast"|"good"|"best" }
Response Format
Success response:
{
"ok": true,
"outputs": {
"gif_base64": "R0lGODlh...",
"webm_base64": "..."
},
"diagnostics": {
"timing_ms": { ... },
"mode": "i2v" // or "t2v"
}
}
usage Examples (curl)
1. Simple Text-to-Video (GIF)
curl -sS -X POST "https://<ENDPOINT>.aws.endpoints.huggingface.cloud" \
-H "Authorization: Bearer <TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"inputs": {
"prompt": "a cyberpunk street in rain, neon lights",
"num_frames": 24,
"outputs": ["gif"]
}
}' \
| jq -er '.outputs.gif_base64' | base64 --decode > output.gif
2. Image-to-Video (GIF)
# MacOS/Linux: Convert image to base64
IMG_B64=$(base64 -i my_input.jpg)
curl -sS -X POST "https://<ENDPOINT>.aws.endpoints.huggingface.cloud" \
-H "Authorization: Bearer <TOKEN>" \
-H "Content-Type: application/json" \
-d "{
\"inputs\": {
\"prompt\": \"waves crashing on the shore, moving water\",
\"image\": \"$IMG_B64\",
\"num_frames\": 32,
\"outputs\": [\"gif\"]
}
}" \
| jq -er '.outputs.gif_base64' | base64 --decode > animated.gif
3. High-Quality Video (WebM)
curl -sS -X POST "https://<ENDPOINT>.aws.endpoints.huggingface.cloud" \
-H "Authorization: Bearer <TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"inputs": {
"prompt": "slow pan over a mars landscape",
"num_frames": 48,
"fps": 24,
"outputs": ["webm"],
"webm": { "quality": "best" }
}
}' \
| jq -er '.outputs.webm_base64' | base64 --decode > output.webm
- Downloads last month
- 486
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support