lightricks/ltx-2-distilled 🔢🖼️📝❓✓ → 🖼️

⭐ Official ▶️ 1.6K runs 📅 Jan 2026 ⚙️ Cog 0.16.9 📄 Paper ⚖️ License
image-to-video-with-audio text-to-video-with-audio

About

The first open source audio-video model

Example Output

Prompt:

"A cinematic close-up of Wednesday Addams frozen mid-dance on a dark, blue-lit ballroom floor as students move indistinctly behind her, their footsteps and muffled music reduced to a distant, underwater thrum; the audio foregrounds her steady breathing and the faint rustle of fabric as she slowly raises one arm, never breaking eye contact with the camera, then after a deliberately long silence she speaks in a flat, dry, perfectly controlled voice, “LTX 2 distilled is now on Replicate,” each word crisp and unemotional, followed by an abrupt cutoff of her voice as the background sound swells slightly, reinforcing the deadpan humor, with precise lip sync, minimal facial movement, stark gothic lighting, and cinematic realism."

Output

Performance Metrics

19.78s Prediction Time
19.79s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/ON78eSHXxpd7JEocDuI1cVbmgVUlSi0BJ8zWI8kPHxxgrOQG/wednesday.png",
  "prompt": "A cinematic close-up of Wednesday Addams frozen mid-dance on a dark, blue-lit ballroom floor as students move indistinctly behind her, their footsteps and muffled music reduced to a distant, underwater thrum; the audio foregrounds her steady breathing and the faint rustle of fabric as she slowly raises one arm, never breaking eye contact with the camera, then after a deliberately long silence she speaks in a flat, dry, perfectly controlled voice, “LTX 2 distilled is now on Replicate,” each word crisp and unemotional, followed by an abrupt cutoff of her voice as the background sound swells slightly, reinforcing the deadpan humor, with precise lip sync, minimal facial movement, stark gothic lighting, and cinematic realism.",
  "num_frames": 121,
  "aspect_ratio": "16:9",
  "enhance_prompt": false,
  "image_strength": 1
}
Input Parameters
seed Type: integer
Random seed for reproducibility.
image Type: string
Optional input image for image-to-video generation.
prompt (required) Type: string
Text prompt for video generation.
num_frames Type: integerDefault: 121Range: 25 - 241
Number of frames to generate. Must follow formula: 8*k + 1 (e.g., 81, 97, 113, 121).
aspect_ratio Default: 16:9
Aspect ratio of the generated video. Ignored if an image is provided.
enhance_prompt Type: booleanDefault: false
Use the model's prompt enhancement to expand and improve your prompt.
image_strength Type: numberDefault: 1Range: 0 - 1
Strength of image conditioning for i2v (0.0-1.0). Higher values follow the image more closely.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using seed: 1337573818
Using image dimensions: 1408x896
Generating 121 frames at 24 fps...
Prompt: A cinematic close-up of Wednesday Addams frozen mid-dance on a dark, blue-lit ballroom floor as students move indistinctly behind her, their footsteps and muffled music reduced to a distant, underwater thrum; the audio foregrounds her steady breathing and the faint rustle of fabric as she slowly raises one arm, never breaking eye contact with the camera, then after a deliberately long silence she speaks in a flat, dry, perfectly controlled voice, “LTX 2 distilled is now on Replicate,” each word crisp and unemotional, followed by an abrupt cutoff of her voice as the background sound swells slightly, reinforcing the deadpan humor, with precise lip sync, minimal facial movement, stark gothic lighting, and cinematic realism.
  0%|          | 0/8 [00:00<?, ?it/s]
 12%|█▎        | 1/8 [00:00<00:02,  2.63it/s]
 25%|██▌       | 2/8 [00:00<00:02,  2.63it/s]
 38%|███▊      | 3/8 [00:01<00:01,  2.58it/s]
 50%|█████     | 4/8 [00:01<00:01,  2.59it/s]
 62%|██████▎   | 5/8 [00:01<00:01,  2.58it/s]
 75%|███████▌  | 6/8 [00:02<00:00,  2.58it/s]
 88%|████████▊ | 7/8 [00:02<00:00,  2.58it/s]
100%|██████████| 8/8 [00:03<00:00,  2.57it/s]
100%|██████████| 8/8 [00:03<00:00,  2.58it/s]
  0%|          | 0/3 [00:00<?, ?it/s]
 33%|███▎      | 1/3 [00:02<00:04,  2.08s/it]
 67%|██████▋   | 2/3 [00:04<00:02,  2.09s/it]
100%|██████████| 3/3 [00:06<00:00,  2.09s/it]
100%|██████████| 3/3 [00:06<00:00,  2.09s/it]
Generation complete! Got 3 video chunks
Encoding video...
  0%|          | 0/3 [00:00<?, ?it/s]
 33%|███▎      | 1/3 [00:00<00:00,  4.26it/s]
 67%|██████▋   | 2/3 [00:00<00:00,  2.04it/s]
100%|██████████| 3/3 [00:01<00:00,  1.55it/s]
100%|██████████| 3/3 [00:01<00:00,  1.73it/s]
Video saved to: /tmp/tmpnyld5qv1/output.mp4
Version Details
Version ID
6707072d3b2a513cee6ab771021d355b5aa52997dded5e1c97d1bebe8d93f920
Version Created
January 7, 2026
Run on Replicate →