lucataco/stable-avatar 🔢🖼️📝✓❓ → 🖼️

▶️ 288 runs 📅 Aug 2025 ⚙️ Cog 0.16.2 📄 Paper ⚖️ License

audio-to-video image-to-video-with-audio lipsync video-consistent-character-generation

About

End-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing

Example Output

Output

Performance Metrics

437.81s Prediction Time

570.92s Total Time

All Input Parameters

{
  "fps": 24,
  "audio": "https://replicate.delivery/pbxt/NXtuYizaYVpc0ZqGw3b4wTpj8bXLpmCGrsHSwzAGpqRUO8cJ/audio-5.wav",
  "image": "https://replicate.delivery/pbxt/NXtuYBainl9p3csysuFQFZxTrHw9U0Qec4TpPloAiQzoAWHC/reference%20%285%29.png",
  "prompt": "",
  "go_fast": false,
  "aspect_ratio": "auto",
  "motion_frame": 24,
  "guidance_scale": 6,
  "gpu_memory_mode": "model_cpu_offload_and_qfloat8",
  "negative_prompt": "Vibrant colors, overexposure, static, blurred details, subtitles, style, artwork, painting, still image,Overall gray, worst quality, low quality, JPEG compression residue, ugly, mutilated, extra fingers,Poorly drawn hands, poorly drawn faces, deformed, disfigured, malformed limbs, fused fingers,Still image, cluttered background, three legs, crowded background, walking backwards",
  "text_guide_scale": 3,
  "audio_guide_scale": 5,
  "num_inference_steps": 50,
  "overlap_window_length": 5
}

Input Parameters

fps Type: integerDefault: 24Range: 1 - 60: Frames per second for output video
seed Type: integer: Random seed for reproducibility
audio (required) Type: string: Audio file to drive the avatar animation
image (required) Type: string: Reference image for avatar generation
prompt Type: stringDefault:: Text prompt describing the scene
go_fast Type: booleanDefault: false: Enable fast mode with optimizations (TeaCache acceleration)
aspect_ratio Default: auto: Output video aspect ratio
motion_frame Type: integerDefault: 24Range: 1 - 50: Motion frame parameter
guidance_scale Type: numberDefault: 6Range: 1 - 10: Guidance scale for generation
gpu_memory_mode Default: model_cpu_offload: GPU memory optimization mode
negative_prompt Type: stringDefault: Vibrant colors, overexposure, static, blurred details, subtitles, style, artwork, painting, still image,Overall gray, worst quality, low quality, JPEG compression residue, ugly, mutilated, extra fingers,Poorly drawn hands, poorly drawn faces, deformed, disfigured, malformed limbs, fused fingers,Still image, cluttered background, three legs, crowded background, walking backwards: Negative prompt to avoid unwanted elements
text_guide_scale Type: numberDefault: 3Range: 1 - 10: Text guidance scale
audio_guide_scale Type: numberDefault: 5Range: 1 - 10: Audio guidance scale
num_inference_steps Type: integerDefault: 50Range: 1 - 100: Number of inference steps
overlap_window_length Type: integerDefault: 5Range: 1 - 20: Overlap window length for long video generation

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using Seed: 428890369
Auto-detected aspect ratio: portrait (input image: 576x1024, ratio: 0.56)
Using portrait aspect ratio: 480x832
  0%|          | 0/50 [00:00<?, ?it/s]/src/wan/models/wan_fantasy_transformer3d_1B.py:191: UserWarning: Padding mask is disabled when using scaled_dot_product_attention. It can have a significant impact on performance.
warnings.warn(
  2%|▏         | 1/50 [00:10<08:30, 10.43s/it]
  4%|▍         | 2/50 [00:18<07:16,  9.08s/it]
  6%|▌         | 3/50 [00:26<06:46,  8.66s/it]
  8%|▊         | 4/50 [00:34<06:29,  8.46s/it]
 10%|█         | 5/50 [00:43<06:15,  8.35s/it]
 12%|█▏        | 6/50 [00:51<06:04,  8.29s/it]
 14%|█▍        | 7/50 [00:59<05:55,  8.26s/it]
 16%|█▌        | 8/50 [01:07<05:46,  8.24s/it]
 18%|█▊        | 9/50 [01:15<05:36,  8.21s/it]
 20%|██        | 10/50 [01:23<05:27,  8.18s/it]
 22%|██▏       | 11/50 [01:31<05:18,  8.16s/it]
 24%|██▍       | 12/50 [01:40<05:09,  8.15s/it]
 26%|██▌       | 13/50 [01:48<05:01,  8.14s/it]
 28%|██▊       | 14/50 [01:56<04:53,  8.14s/it]
 30%|███       | 15/50 [02:04<04:44,  8.14s/it]
 32%|███▏      | 16/50 [02:12<04:36,  8.14s/it]
 34%|███▍      | 17/50 [02:20<04:28,  8.14s/it]
 36%|███▌      | 18/50 [02:28<04:20,  8.14s/it]
 38%|███▊      | 19/50 [02:37<04:12,  8.14s/it]
 40%|████      | 20/50 [02:45<04:03,  8.13s/it]
 42%|████▏     | 21/50 [02:53<03:55,  8.13s/it]
 44%|████▍     | 22/50 [03:01<03:47,  8.13s/it]
 46%|████▌     | 23/50 [03:09<03:39,  8.13s/it]
 48%|████▊     | 24/50 [03:17<03:31,  8.13s/it]
 50%|█████     | 25/50 [03:25<03:23,  8.13s/it]
 52%|█████▏    | 26/50 [03:33<03:15,  8.13s/it]
 54%|█████▍    | 27/50 [03:42<03:06,  8.13s/it]
 56%|█████▌    | 28/50 [03:50<02:58,  8.13s/it]
 58%|█████▊    | 29/50 [03:58<02:50,  8.12s/it]
 60%|██████    | 30/50 [04:06<02:42,  8.12s/it]
 62%|██████▏   | 31/50 [04:14<02:34,  8.12s/it]
 64%|██████▍   | 32/50 [04:22<02:26,  8.12s/it]
 66%|██████▌   | 33/50 [04:30<02:18,  8.12s/it]
 68%|██████▊   | 34/50 [04:38<02:10,  8.13s/it]
 70%|███████   | 35/50 [04:47<02:01,  8.13s/it]
 72%|███████▏  | 36/50 [04:55<01:53,  8.13s/it]
 74%|███████▍  | 37/50 [05:03<01:45,  8.13s/it]
 76%|███████▌  | 38/50 [05:11<01:37,  8.13s/it]
 78%|███████▊  | 39/50 [05:19<01:29,  8.13s/it]
 80%|████████  | 40/50 [05:27<01:21,  8.12s/it]
 82%|████████▏ | 41/50 [05:35<01:13,  8.12s/it]
 84%|████████▍ | 42/50 [05:43<01:04,  8.12s/it]
 86%|████████▌ | 43/50 [05:52<00:56,  8.12s/it]
 88%|████████▊ | 44/50 [06:00<00:48,  8.12s/it]
 90%|█████████ | 45/50 [06:08<00:40,  8.12s/it]
 92%|█████████▏| 46/50 [06:16<00:32,  8.12s/it]
 94%|█████████▍| 47/50 [06:24<00:24,  8.12s/it]
 96%|█████████▌| 48/50 [06:32<00:16,  8.11s/it]
 98%|█████████▊| 49/50 [06:40<00:08,  8.11s/it]
100%|██████████| 50/50 [06:48<00:00,  8.11s/it]
100%|██████████| 50/50 [06:48<00:00,  8.18s/it]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Version Details

Version ID: 4b3bd758c59166c12d9b46eee3565b9d67f2f4330909bf500a5c70ade3b46709
Version Created: August 16, 2025

Run on Replicate →