audio-to-video AI Models

wan-video/wan-2.5-t2v

Generate videos with synchronized audio from text prompts using Alibaba's WAN 2.5 model. Creates fully synchronized vide...

📝 → 🎥 • text-to-video-with-audio • lipsync • multilingual • 35.7K runs

🤖 Model 🖼️ → 🎥

wan-video/wan-2.2-s2v

Generates cinematic videos synchronized to audio from a reference image, text prompt, and audio file. Built on the Wan2....

🖼️ → 🎥 • audio-to-video • image-to-video • lipsync • 115.5K runs

🤖 Model 🎥

fofr/audio-to-waveform

Generate a waveform video from an audio file. Convert music, podcasts, or voice tracks into a bar-style audio visualizer...

🎥 • audio-to-video • audio-visualizer • 383.6K runs

🤖 Model 📝 → 🎥

wan-video/wan-2.5-t2v-fast

Generates videos with synchronized audio from text prompts, optimized for faster generation times compared to the standa...

📝 → 🎥 • text-to-video-with-audio • multilingual • 49.9K runs

🤖 Model 🖼️ → 🎥

lucataco/talking-avatar

Generate talking avatars by combining an input image of a person and an audio file for lip synchronization. The model in...

🖼️ → 🎥 • talking-avatar • lip-sync • video-generation • 291 runs

🤖 Model 🖼️ → 🎥

lucataco/stable-avatar

Generate audio-driven talking avatar videos from a single reference image and an input audio track. Synchronize lip move...

🖼️ → 🎥 • lipsync • image-to-video-with-audio • video-consistent-character-generation • 206 runs

🤖 Model 🖼️ → 🎥

cjwbw/aniportrait-audio2vid

Generate a talking-head video with audio from a single portrait image and an audio track. Animate the face with audio-dr...

🖼️ → 🎥 • lipsync • image-to-video-with-audio • 14.7K runs

🤖 Model 🎥

camenduru/emage

Generate co-speech gesture animations from audio input using expressive masked audio gesture modeling. This model output...

🎥 • gesture-generation • audio-to-video • co-speech-gestures • 109 runs

🤖 Model 🎥

pipeline-examples/talking-avatar

Generate lip-synced talking avatar videos from an input image and audio file, suitable for UGC, TikTok, and Reels conten...

🎥 • lip-sync • talking-avatar • video-generation • 1 runs

🤖 Model

ddvinh1/audio-lip

Generate lip-synced talking-head video from a face image or face video and an audio track. Align mouth movements to spee...

lipsync • 86 runs

🤖 Model 🖼️ → 🎥

zsxkib/multitalk

Generate conversational talking‑head videos from a reference image and one or two audio tracks. Provide an image with on...

🖼️ → 🎥 • lipsync • image-to-video-with-audio • audio-to-video • 2.3K runs

🤖 Model 🎥

stefan-st/face-diffuser

Generate lip-synced talking-head video from speech audio. Animate a preset face identity (F1–F8, M1–M6) with an optional...

🎥 • audio-to-video • lipsync • 205 runs

🤖 Model 🖼️ → 🎥

bytedance/omni-human

Generate talking or singing videos from a single image and an audio clip. Provide a portrait, half-body, or full-body im...

🖼️ → 🎥 • lipsync • image-to-video • 144.2K runs

🤖 Model 🎥

yuanxunlu/livespeechportraits

Generate lip-synced talking-head videos from an input audio clip. Provide a driving audio file (only the first 20 second...

🎥 • lipsync • audio-to-video • 9.8K runs

🤖 Model 🖼️ → 🎥

lucataco/img-and-audio2video

Create a video from a still image and an audio file. Takes one image and an audio track and outputs a video with the ima...

🖼️ → 🎥 • image-to-video • audio-to-video • 11.0K runs

🤖 Model 🎥

zsxkib/memo

Generate talking head videos from a single image and an audio clip. Animate lip movements and emotion-aligned facial exp...

🎥 • lipsync • audio-to-video • talking-head • 986 runs

🤖 Model 📝 → 🎥

zsxkib/humo

Generate short videos from a text prompt, optionally syncing lip movements and body motion to an input audio track. Acce...

📝 → 🎥 • text-to-video • lipsync • video-consistent-character-generation • 61 runs

🤖 Model 🎥

cjwbw/sadtalker

Animate a single image into a lip-synced talking-head video from an audio clip. Provide a source portrait and speech aud...

🎥 • lipsync • audio-to-video • 145.8K runs

🤖 Model 🎥

cyberdude/stable-diffusion-dance

Generate audio-reactive video frames from a text prompt and an audio file. Use a Stable Diffusion pipeline to synthesize...

🎥 • audio-to-video • music-visualization • 7 runs

🤖 Model 📝 → 🎥

voodoohop/stable-diffusion-dance

Generate audio‑reactive video from text prompts and an input audio file. Provide one or multiple newline‑separated promp...

📝 → 🎥 • text-to-video • audio-to-video • audio-reactive • 21 runs