lucataco/ltx-video-0.9.8-distilled 🔢🖼️📝✓❓ → 🖼️

▶️ 20.8K runs 📅 Jul 2025 ⚙️ Cog 0.15.11
image-to-video text-to-video video-to-video

About

Generate native long-form video, with controllability

Example Output

Prompt:

"The turquoise waves crash against the dark, jagged rocks of the shore, sending white foam spraying into the air. The scene is dominated by the stark contrast between the bright blue water and the dark, almost black rocks. The water is a clear, turquoise color, and the waves are capped with white foam. The rocks are dark and jagged, and they are covered in patches of green moss. The shore is lined with lush green vegetation, including trees and bushes. In the background, there are rolling hills covered in dense forest. The sky is cloudy, and the light is dim"

Output

Performance Metrics

13.61s Prediction Time
95.24s Total Time
All Input Parameters
{
  "fps": 24,
  "prompt": "The turquoise waves crash against the dark, jagged rocks of the shore, sending white foam spraying into the air. The scene is dominated by the stark contrast between the bright blue water and the dark, almost black rocks. The water is a clear, turquoise color, and the waves are capped with white foam. The rocks are dark and jagged, and they are covered in patches of green moss. The shore is lined with lush green vegetation, including trees and bushes. In the background, there are rolling hills covered in dense forest. The sky is cloudy, and the light is dim",
  "go_fast": true,
  "num_frames": 121,
  "resolution": 480,
  "aspect_ratio": "match_input_image",
  "guidance_scale": 3,
  "negative_prompt": "worst quality, inconsistent motion, blurry, jittery, distorted",
  "denoise_strength": 0.4,
  "downscale_factor": 0.667,
  "conditioning_frames": 21,
  "num_inference_steps": 24,
  "max_duration_seconds": 5,
  "final_inference_steps": 10
}
Input Parameters
fps Type: integerDefault: 24Range: 1 - 60
Frames per second for the output video.
seed Type: integer
Random seed for reproducible results. Leave blank for a random seed.
image Type: string
Input image for image-to-video generation. If provided, will generate video from this image.
video Type: string
Input video for video-to-video generation. If provided, will generate video from this video. Takes precedence over image if both are provided.
prompt (required) Type: string
Text prompt for video generation
go_fast Type: booleanDefault: true
Enable fast mode with skip layer strategies (20-40% faster but slightly lower quality).
num_frames Type: integerDefault: 121Range: 9 - 257
Number of frames per segment. Use 257 for ~10.7s segments at 24fps (recommended for long videos).
resolution Default: 720
Resolution for the output video (height in pixels).
aspect_ratio Default: match_input_image
Aspect ratio for the output video.
guidance_scale Type: numberDefault: 3Range: 1 - 10
Guidance scale. Recommended range: 3.0-3.5.
negative_prompt Type: stringDefault: worst quality, inconsistent motion, blurry, jittery, distorted
Negative prompt for video generation.
denoise_strength Type: numberDefault: 0.4Range: 0 - 1
Denoising strength for final refinement step.
downscale_factor Type: numberDefault: 0.667Range: 0.1 - 1
Factor to downscale initial generation (recommended: 2/3 for better quality).
conditioning_frames Type: integerDefault: 21Range: 1 - 50
Number of frames to use for video-to-video conditioning (only used when video input is provided).
num_inference_steps Type: integerDefault: 24Range: 2 - 50
Number of denoising steps for initial generation.
max_duration_seconds Type: integerDefault: 5Range: 5 - 60
Target video duration in seconds. Videos longer than 10s use autoregressive generation.
final_inference_steps Type: integerDefault: 10Range: 1 - 50
Number of inference steps for final denoising.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using seed: 41686
Warning: match_input_image selected but no image provided. Using 16:9 aspect ratio.
Rounded dimensions for optimal processing: 853x480 -> 864x480
Using resolution: 864x480 (16:9)
Step 1: Text-to-Video generation at downscaled resolution: 576x320
Using fast mode with optimized parameters
  0%|          | 0/12 [00:00<?, ?it/s]
  8%|▊         | 1/12 [00:00<00:04,  2.26it/s]
 17%|█▋        | 2/12 [00:00<00:03,  2.73it/s]
 25%|██▌       | 3/12 [00:01<00:03,  2.60it/s]
 33%|███▎      | 4/12 [00:01<00:03,  2.54it/s]
 42%|████▏     | 5/12 [00:01<00:02,  2.52it/s]
 50%|█████     | 6/12 [00:02<00:02,  2.50it/s]
 58%|█████▊    | 7/12 [00:02<00:02,  2.49it/s]
 67%|██████▋   | 8/12 [00:03<00:01,  2.47it/s]
 75%|███████▌  | 9/12 [00:03<00:01,  2.47it/s]
 83%|████████▎ | 10/12 [00:04<00:00,  2.47it/s]
 92%|█████████▏| 11/12 [00:04<00:00,  2.46it/s]
100%|██████████| 12/12 [00:04<00:00,  2.46it/s]
100%|██████████| 12/12 [00:04<00:00,  2.49it/s]
Step 2: Upsampling to: 1152x640
Step 3: Final denoising with 10 steps
  0%|          | 0/2 [00:00<?, ?it/s]
 50%|█████     | 1/2 [00:01<00:01,  1.97s/it]
100%|██████████| 2/2 [00:03<00:00,  1.73s/it]
100%|██████████| 2/2 [00:03<00:00,  1.76s/it]
Step 4: Final resize to: 853x480
IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (853, 480) to (864, 480) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).
[+] Text-to-Video generation complete: /tmp/output.mp4
Version Details
Version ID
6757cbcee0253dca9e6c4df0e026c009b58673bbaaf1d88d3f4058cfc692fba5
Version Created
July 24, 2025
Run on Replicate →