lucataco/stable-avatar 🔢🖼️📝✓❓ → 🖼️

▶️ 206 runs 📅 Aug 2025 ⚙️ Cog 0.16.2 📄 Paper ⚖️ License
audio-to-video lipsync

About

End-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing

Example Output

Output

Performance Metrics

437.81s Prediction Time
570.92s Total Time
All Input Parameters
{
  "fps": 24,
  "audio": "https://replicate.delivery/pbxt/NXtuYizaYVpc0ZqGw3b4wTpj8bXLpmCGrsHSwzAGpqRUO8cJ/audio-5.wav",
  "image": "https://replicate.delivery/pbxt/NXtuYBainl9p3csysuFQFZxTrHw9U0Qec4TpPloAiQzoAWHC/reference%20%285%29.png",
  "prompt": "",
  "go_fast": false,
  "aspect_ratio": "auto",
  "motion_frame": 24,
  "guidance_scale": 6,
  "gpu_memory_mode": "model_cpu_offload_and_qfloat8",
  "negative_prompt": "Vibrant colors, overexposure, static, blurred details, subtitles, style, artwork, painting, still image,Overall gray, worst quality, low quality, JPEG compression residue, ugly, mutilated, extra fingers,Poorly drawn hands, poorly drawn faces, deformed, disfigured, malformed limbs, fused fingers,Still image, cluttered background, three legs, crowded background, walking backwards",
  "text_guide_scale": 3,
  "audio_guide_scale": 5,
  "num_inference_steps": 50,
  "overlap_window_length": 5
}
Input Parameters
fps Type: integerDefault: 24Range: 1 - 60
Frames per second for output video
seed Type: integer
Random seed for reproducibility
audio (required) Type: string
Audio file to drive the avatar animation
image (required) Type: string
Reference image for avatar generation
prompt Type: stringDefault:
Text prompt describing the scene
go_fast Type: booleanDefault: false
Enable fast mode with optimizations (TeaCache acceleration)
aspect_ratio Default: auto
Output video aspect ratio
motion_frame Type: integerDefault: 24Range: 1 - 50
Motion frame parameter
guidance_scale Type: numberDefault: 6Range: 1 - 10
Guidance scale for generation
gpu_memory_mode Default: model_cpu_offload
GPU memory optimization mode
negative_prompt Type: stringDefault: Vibrant colors, overexposure, static, blurred details, subtitles, style, artwork, painting, still image,Overall gray, worst quality, low quality, JPEG compression residue, ugly, mutilated, extra fingers,Poorly drawn hands, poorly drawn faces, deformed, disfigured, malformed limbs, fused fingers,Still image, cluttered background, three legs, crowded background, walking backwards
Negative prompt to avoid unwanted elements
text_guide_scale Type: numberDefault: 3Range: 1 - 10
Text guidance scale
audio_guide_scale Type: numberDefault: 5Range: 1 - 10
Audio guidance scale
num_inference_steps Type: integerDefault: 50Range: 1 - 100
Number of inference steps
overlap_window_length Type: integerDefault: 5Range: 1 - 20
Overlap window length for long video generation
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using Seed: 428890369
Auto-detected aspect ratio: portrait (input image: 576x1024, ratio: 0.56)
Using portrait aspect ratio: 480x832
  0%|          | 0/50 [00:00<?, ?it/s]/src/wan/models/wan_fantasy_transformer3d_1B.py:191: UserWarning: Padding mask is disabled when using scaled_dot_product_attention. It can have a significant impact on performance.
warnings.warn(
  2%|▏         | 1/50 [00:10<08:30, 10.43s/it]
  4%|▍         | 2/50 [00:18<07:16,  9.08s/it]
  6%|▌         | 3/50 [00:26<06:46,  8.66s/it]
  8%|▊         | 4/50 [00:34<06:29,  8.46s/it]
 10%|█         | 5/50 [00:43<06:15,  8.35s/it]
 12%|█▏        | 6/50 [00:51<06:04,  8.29s/it]
 14%|█▍        | 7/50 [00:59<05:55,  8.26s/it]
 16%|█▌        | 8/50 [01:07<05:46,  8.24s/it]
 18%|█▊        | 9/50 [01:15<05:36,  8.21s/it]
 20%|██        | 10/50 [01:23<05:27,  8.18s/it]
 22%|██▏       | 11/50 [01:31<05:18,  8.16s/it]
 24%|██▍       | 12/50 [01:40<05:09,  8.15s/it]
 26%|██▌       | 13/50 [01:48<05:01,  8.14s/it]
 28%|██▊       | 14/50 [01:56<04:53,  8.14s/it]
 30%|███       | 15/50 [02:04<04:44,  8.14s/it]
 32%|███▏      | 16/50 [02:12<04:36,  8.14s/it]
 34%|███▍      | 17/50 [02:20<04:28,  8.14s/it]
 36%|███▌      | 18/50 [02:28<04:20,  8.14s/it]
 38%|███▊      | 19/50 [02:37<04:12,  8.14s/it]
 40%|████      | 20/50 [02:45<04:03,  8.13s/it]
 42%|████▏     | 21/50 [02:53<03:55,  8.13s/it]
 44%|████▍     | 22/50 [03:01<03:47,  8.13s/it]
 46%|████▌     | 23/50 [03:09<03:39,  8.13s/it]
 48%|████▊     | 24/50 [03:17<03:31,  8.13s/it]
 50%|█████     | 25/50 [03:25<03:23,  8.13s/it]
 52%|█████▏    | 26/50 [03:33<03:15,  8.13s/it]
 54%|█████▍    | 27/50 [03:42<03:06,  8.13s/it]
 56%|█████▌    | 28/50 [03:50<02:58,  8.13s/it]
 58%|█████▊    | 29/50 [03:58<02:50,  8.12s/it]
 60%|██████    | 30/50 [04:06<02:42,  8.12s/it]
 62%|██████▏   | 31/50 [04:14<02:34,  8.12s/it]
 64%|██████▍   | 32/50 [04:22<02:26,  8.12s/it]
 66%|██████▌   | 33/50 [04:30<02:18,  8.12s/it]
 68%|██████▊   | 34/50 [04:38<02:10,  8.13s/it]
 70%|███████   | 35/50 [04:47<02:01,  8.13s/it]
 72%|███████▏  | 36/50 [04:55<01:53,  8.13s/it]
 74%|███████▍  | 37/50 [05:03<01:45,  8.13s/it]
 76%|███████▌  | 38/50 [05:11<01:37,  8.13s/it]
 78%|███████▊  | 39/50 [05:19<01:29,  8.13s/it]
 80%|████████  | 40/50 [05:27<01:21,  8.12s/it]
 82%|████████▏ | 41/50 [05:35<01:13,  8.12s/it]
 84%|████████▍ | 42/50 [05:43<01:04,  8.12s/it]
 86%|████████▌ | 43/50 [05:52<00:56,  8.12s/it]
 88%|████████▊ | 44/50 [06:00<00:48,  8.12s/it]
 90%|█████████ | 45/50 [06:08<00:40,  8.12s/it]
 92%|█████████▏| 46/50 [06:16<00:32,  8.12s/it]
 94%|█████████▍| 47/50 [06:24<00:24,  8.12s/it]
 96%|█████████▌| 48/50 [06:32<00:16,  8.11s/it]
 98%|█████████▊| 49/50 [06:40<00:08,  8.11s/it]
100%|██████████| 50/50 [06:48<00:00,  8.11s/it]
100%|██████████| 50/50 [06:48<00:00,  8.18s/it]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Version Details
Version ID
4b3bd758c59166c12d9b46eee3565b9d67f2f4330909bf500a5c70ade3b46709
Version Created
August 16, 2025
Run on Replicate →