lightricks/ltx-2-distilled 🔢🖼️📝❓✓ → 🖼️

⭐ Official ▶️ 27.4K runs 📅 Jan 2026 ⚙️ Cog 0.16.9 📄 Paper ⚖️ License

image-to-video-with-audio text-to-video-with-audio

Performance

19.8sTypical run time

27.4KTotal runs

About

LTX-2: The first open source audio-video model

Example Output

Prompt:

"A cinematic close-up of Wednesday Addams frozen mid-dance on a dark, blue-lit ballroom floor as students move indistinctly behind her, their footsteps and muffled music reduced to a distant, underwater thrum; the audio foregrounds her steady breathing and the faint rustle of fabric as she slowly raises one arm, never breaking eye contact with the camera, then after a deliberately long silence she speaks in a flat, dry, perfectly controlled voice, “LTX 2 distilled is now on Replicate,” each word crisp and unemotional, followed by an abrupt cutoff of her voice as the background sound swells slightly, reinforcing the deadpan humor, with precise lip sync, minimal facial movement, stark gothic lighting, and cinematic realism."

Output

Performance Metrics

19.78s Prediction Time

19.79s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/ON78eSHXxpd7JEocDuI1cVbmgVUlSi0BJ8zWI8kPHxxgrOQG/wednesday.png",
  "prompt": "A cinematic close-up of Wednesday Addams frozen mid-dance on a dark, blue-lit ballroom floor as students move indistinctly behind her, their footsteps and muffled music reduced to a distant, underwater thrum; the audio foregrounds her steady breathing and the faint rustle of fabric as she slowly raises one arm, never breaking eye contact with the camera, then after a deliberately long silence she speaks in a flat, dry, perfectly controlled voice, “LTX 2 distilled is now on Replicate,” each word crisp and unemotional, followed by an abrupt cutoff of her voice as the background sound swells slightly, reinforcing the deadpan humor, with precise lip sync, minimal facial movement, stark gothic lighting, and cinematic realism.",
  "num_frames": 121,
  "aspect_ratio": "16:9",
  "enhance_prompt": false,
  "image_strength": 1
}

Input Parameters

seed Type: integer: Random seed for reproducibility.
image Type: string: Optional input image for image-to-video generation.
prompt (required) Type: string: Text prompt for video generation.
num_frames Type: integerDefault: 121Range: 25 - 241: Number of frames to generate. Must follow formula: 8*k + 1 (e.g., 81, 97, 113, 121).
aspect_ratio Default: 16:9: Aspect ratio of the generated video. Ignored if an image is provided.
enhance_prompt Type: booleanDefault: false: Use the model's prompt enhancement to expand and improve your prompt.
image_strength Type: numberDefault: 1Range: 0 - 1: Strength of image conditioning for i2v (0.0-1.0). Higher values follow the image more closely.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using seed: 1337573818
Using image dimensions: 1408x896
Generating 121 frames at 24 fps...
Prompt: A cinematic close-up of Wednesday Addams frozen mid-dance on a dark, blue-lit ballroom floor as students move indistinctly behind her, their footsteps and muffled music reduced to a distant, underwater thrum; the audio foregrounds her steady breathing and the faint rustle of fabric as she slowly raises one arm, never breaking eye contact with the camera, then after a deliberately long silence she speaks in a flat, dry, perfectly controlled voice, “LTX 2 distilled is now on Replicate,” each word crisp and unemotional, followed by an abrupt cutoff of her voice as the background sound swells slightly, reinforcing the deadpan humor, with precise lip sync, minimal facial movement, stark gothic lighting, and cinematic realism.
  0%|          | 0/8 [00:00<?, ?it/s]
 12%|█▎        | 1/8 [00:00<00:02,  2.63it/s]
 25%|██▌       | 2/8 [00:00<00:02,  2.63it/s]
 38%|███▊      | 3/8 [00:01<00:01,  2.58it/s]
 50%|█████     | 4/8 [00:01<00:01,  2.59it/s]
 62%|██████▎   | 5/8 [00:01<00:01,  2.58it/s]
 75%|███████▌  | 6/8 [00:02<00:00,  2.58it/s]
 88%|████████▊ | 7/8 [00:02<00:00,  2.58it/s]
100%|██████████| 8/8 [00:03<00:00,  2.57it/s]
100%|██████████| 8/8 [00:03<00:00,  2.58it/s]
  0%|          | 0/3 [00:00<?, ?it/s]
 33%|███▎      | 1/3 [00:02<00:04,  2.08s/it]
 67%|██████▋   | 2/3 [00:04<00:02,  2.09s/it]
100%|██████████| 3/3 [00:06<00:00,  2.09s/it]
100%|██████████| 3/3 [00:06<00:00,  2.09s/it]
Generation complete! Got 3 video chunks
Encoding video...
  0%|          | 0/3 [00:00<?, ?it/s]
 33%|███▎      | 1/3 [00:00<00:00,  4.26it/s]
 67%|██████▋   | 2/3 [00:00<00:00,  2.04it/s]
100%|██████████| 3/3 [00:01<00:00,  1.55it/s]
100%|██████████| 3/3 [00:01<00:00,  1.73it/s]
Video saved to: /tmp/tmpnyld5qv1/output.mp4

Version Details

Version ID: 6707072d3b2a513cee6ab771021d355b5aa52997dded5e1c97d1bebe8d93f920
Version Created: January 7, 2026

Run on Replicate →