lucataco/stable-avatar 🔢🖼️📝✓❓ → 🖼️
About
End-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing
Example Output
Output
Performance Metrics
437.81s
Prediction Time
570.92s
Total Time
All Input Parameters
{ "fps": 24, "audio": "https://replicate.delivery/pbxt/NXtuYizaYVpc0ZqGw3b4wTpj8bXLpmCGrsHSwzAGpqRUO8cJ/audio-5.wav", "image": "https://replicate.delivery/pbxt/NXtuYBainl9p3csysuFQFZxTrHw9U0Qec4TpPloAiQzoAWHC/reference%20%285%29.png", "prompt": "", "go_fast": false, "aspect_ratio": "auto", "motion_frame": 24, "guidance_scale": 6, "gpu_memory_mode": "model_cpu_offload_and_qfloat8", "negative_prompt": "Vibrant colors, overexposure, static, blurred details, subtitles, style, artwork, painting, still image,Overall gray, worst quality, low quality, JPEG compression residue, ugly, mutilated, extra fingers,Poorly drawn hands, poorly drawn faces, deformed, disfigured, malformed limbs, fused fingers,Still image, cluttered background, three legs, crowded background, walking backwards", "text_guide_scale": 3, "audio_guide_scale": 5, "num_inference_steps": 50, "overlap_window_length": 5 }
Input Parameters
- fps
- Frames per second for output video
- seed
- Random seed for reproducibility
- audio (required)
- Audio file to drive the avatar animation
- image (required)
- Reference image for avatar generation
- prompt
- Text prompt describing the scene
- go_fast
- Enable fast mode with optimizations (TeaCache acceleration)
- aspect_ratio
- Output video aspect ratio
- motion_frame
- Motion frame parameter
- guidance_scale
- Guidance scale for generation
- gpu_memory_mode
- GPU memory optimization mode
- negative_prompt
- Negative prompt to avoid unwanted elements
- text_guide_scale
- Text guidance scale
- audio_guide_scale
- Audio guidance scale
- num_inference_steps
- Number of inference steps
- overlap_window_length
- Overlap window length for long video generation
Output Schema
Output
Example Execution Logs
Using Seed: 428890369 Auto-detected aspect ratio: portrait (input image: 576x1024, ratio: 0.56) Using portrait aspect ratio: 480x832 0%| | 0/50 [00:00<?, ?it/s]/src/wan/models/wan_fantasy_transformer3d_1B.py:191: UserWarning: Padding mask is disabled when using scaled_dot_product_attention. It can have a significant impact on performance. warnings.warn( 2%|▏ | 1/50 [00:10<08:30, 10.43s/it] 4%|▍ | 2/50 [00:18<07:16, 9.08s/it] 6%|▌ | 3/50 [00:26<06:46, 8.66s/it] 8%|▊ | 4/50 [00:34<06:29, 8.46s/it] 10%|█ | 5/50 [00:43<06:15, 8.35s/it] 12%|█▏ | 6/50 [00:51<06:04, 8.29s/it] 14%|█▍ | 7/50 [00:59<05:55, 8.26s/it] 16%|█▌ | 8/50 [01:07<05:46, 8.24s/it] 18%|█▊ | 9/50 [01:15<05:36, 8.21s/it] 20%|██ | 10/50 [01:23<05:27, 8.18s/it] 22%|██▏ | 11/50 [01:31<05:18, 8.16s/it] 24%|██▍ | 12/50 [01:40<05:09, 8.15s/it] 26%|██▌ | 13/50 [01:48<05:01, 8.14s/it] 28%|██▊ | 14/50 [01:56<04:53, 8.14s/it] 30%|███ | 15/50 [02:04<04:44, 8.14s/it] 32%|███▏ | 16/50 [02:12<04:36, 8.14s/it] 34%|███▍ | 17/50 [02:20<04:28, 8.14s/it] 36%|███▌ | 18/50 [02:28<04:20, 8.14s/it] 38%|███▊ | 19/50 [02:37<04:12, 8.14s/it] 40%|████ | 20/50 [02:45<04:03, 8.13s/it] 42%|████▏ | 21/50 [02:53<03:55, 8.13s/it] 44%|████▍ | 22/50 [03:01<03:47, 8.13s/it] 46%|████▌ | 23/50 [03:09<03:39, 8.13s/it] 48%|████▊ | 24/50 [03:17<03:31, 8.13s/it] 50%|█████ | 25/50 [03:25<03:23, 8.13s/it] 52%|█████▏ | 26/50 [03:33<03:15, 8.13s/it] 54%|█████▍ | 27/50 [03:42<03:06, 8.13s/it] 56%|█████▌ | 28/50 [03:50<02:58, 8.13s/it] 58%|█████▊ | 29/50 [03:58<02:50, 8.12s/it] 60%|██████ | 30/50 [04:06<02:42, 8.12s/it] 62%|██████▏ | 31/50 [04:14<02:34, 8.12s/it] 64%|██████▍ | 32/50 [04:22<02:26, 8.12s/it] 66%|██████▌ | 33/50 [04:30<02:18, 8.12s/it] 68%|██████▊ | 34/50 [04:38<02:10, 8.13s/it] 70%|███████ | 35/50 [04:47<02:01, 8.13s/it] 72%|███████▏ | 36/50 [04:55<01:53, 8.13s/it] 74%|███████▍ | 37/50 [05:03<01:45, 8.13s/it] 76%|███████▌ | 38/50 [05:11<01:37, 8.13s/it] 78%|███████▊ | 39/50 [05:19<01:29, 8.13s/it] 80%|████████ | 40/50 [05:27<01:21, 8.12s/it] 82%|████████▏ | 41/50 [05:35<01:13, 8.12s/it] 84%|████████▍ | 42/50 [05:43<01:04, 8.12s/it] 86%|████████▌ | 43/50 [05:52<00:56, 8.12s/it] 88%|████████▊ | 44/50 [06:00<00:48, 8.12s/it] 90%|█████████ | 45/50 [06:08<00:40, 8.12s/it] 92%|█████████▏| 46/50 [06:16<00:32, 8.12s/it] 94%|█████████▍| 47/50 [06:24<00:24, 8.12s/it] 96%|█████████▌| 48/50 [06:32<00:16, 8.11s/it] 98%|█████████▊| 49/50 [06:40<00:08, 8.11s/it] 100%|██████████| 50/50 [06:48<00:00, 8.11s/it] 100%|██████████| 50/50 [06:48<00:00, 8.18s/it] huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Version Details
- Version ID
4b3bd758c59166c12d9b46eee3565b9d67f2f4330909bf500a5c70ade3b46709
- Version Created
- August 16, 2025