lightricks/ltx-video-0.9.7-distilled 🔢🖼️📝✓❓ → 🖼️

▶️ 12.3K runs 📅 Jul 2025 ⚙️ Cog 0.15.10 🔗 GitHub ⚖️ License

image-to-video text-to-video video-to-video

About

Faster slight quality reduction compared to LTX-Video 13b

Example Output

Prompt:

"A woman with light skin, wearing a blue jacket and a black hat with a veil, looks down and to her right, then back up as she speaks; she has brown hair styled in an updo, light brown eyebrows, and is wearing a white collared shirt under her jacket; the camera remains stationary on her face as she speaks; the background is out of focus, but shows trees and people in period clothing; the scene is captured in real-life footage."

Output

Performance Metrics

13.53s Prediction Time

79.07s Total Time

All Input Parameters

{
  "fps": 24,
  "prompt": "A woman with light skin, wearing a blue jacket and a black hat with a veil, looks down and to her right, then back up as she speaks; she has brown hair styled in an updo, light brown eyebrows, and is wearing a white collared shirt under her jacket; the camera remains stationary on her face as she speaks; the background is out of focus, but shows trees and people in period clothing; the scene is captured in real-life footage.",
  "go_fast": true,
  "num_frames": 121,
  "resolution": 480,
  "aspect_ratio": "match_input_image",
  "guidance_scale": 3,
  "negative_prompt": "worst quality, inconsistent motion, blurry, jittery, distorted",
  "denoise_strength": 0.4,
  "downscale_factor": 0.667,
  "conditioning_frames": 21,
  "num_inference_steps": 24,
  "final_inference_steps": 10
}

Input Parameters

fps Type: integerDefault: 24Range: 1 - 60: Frames per second for the output video.
seed Type: integer: Random seed for reproducible results. Leave blank for a random seed.
image Type: string: Input image for image-to-video generation. If provided, will generate video from this image.
video Type: string: Input video for video-to-video generation. If provided, will generate video from this video. Takes precedence over image if both are provided.
prompt (required) Type: string: Text prompt for video generation
go_fast Type: booleanDefault: true: Enable fast mode with skip layer strategies (20-40% faster but slightly lower quality).
num_frames Type: integerDefault: 121Range: 9 - 257: Number of frames to generate.
resolution Default: 720: Resolution for the output video (height in pixels).
aspect_ratio Default: match_input_image: Aspect ratio for the output video.
guidance_scale Type: numberDefault: 3Range: 1 - 10: Guidance scale. Recommended range: 3.0-3.5.
negative_prompt Type: stringDefault: worst quality, inconsistent motion, blurry, jittery, distorted: Negative prompt for video generation.
denoise_strength Type: numberDefault: 0.4Range: 0 - 1: Denoising strength for final refinement step.
downscale_factor Type: numberDefault: 0.667Range: 0.1 - 1: Factor to downscale initial generation (recommended: 2/3 for better quality).
conditioning_frames Type: integerDefault: 21Range: 1 - 50: Number of frames to use for video-to-video conditioning (only used when video input is provided).
num_inference_steps Type: integerDefault: 24Range: 2 - 50: Number of denoising steps for initial generation.
final_inference_steps Type: integerDefault: 10Range: 1 - 50: Number of inference steps for final denoising.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using seed: 23248
Warning: match_input_image selected but no image provided. Using 16:9 aspect ratio.
Rounded dimensions for optimal processing: 853x480 -> 864x480
Using resolution: 864x480 (16:9)
Step 1: Text-to-Video generation at downscaled resolution: 576x320
Using fast mode with optimized parameters
  0%|          | 0/12 [00:00<?, ?it/s]
  8%|▊         | 1/12 [00:00<00:04,  2.28it/s]
 17%|█▋        | 2/12 [00:00<00:03,  2.73it/s]
 25%|██▌       | 3/12 [00:01<00:03,  2.61it/s]
 33%|███▎      | 4/12 [00:01<00:03,  2.55it/s]
 42%|████▏     | 5/12 [00:01<00:02,  2.52it/s]
 50%|█████     | 6/12 [00:02<00:02,  2.50it/s]
 58%|█████▊    | 7/12 [00:02<00:02,  2.49it/s]
 67%|██████▋   | 8/12 [00:03<00:01,  2.48it/s]
 75%|███████▌  | 9/12 [00:03<00:01,  2.48it/s]
 83%|████████▎ | 10/12 [00:03<00:00,  2.47it/s]
 92%|█████████▏| 11/12 [00:04<00:00,  2.47it/s]
100%|██████████| 12/12 [00:04<00:00,  2.47it/s]
100%|██████████| 12/12 [00:04<00:00,  2.49it/s]
Step 2: Upsampling to: 1152x640
Step 3: Final denoising with 10 steps
  0%|          | 0/2 [00:00<?, ?it/s]
 50%|█████     | 1/2 [00:01<00:01,  1.97s/it]
100%|██████████| 2/2 [00:03<00:00,  1.73s/it]
100%|██████████| 2/2 [00:03<00:00,  1.77s/it]
Step 4: Final resize to: 853x480
IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (853, 480) to (864, 480) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).
[+] Text-to-Video generation complete: output.mp4

Version Details

Version ID: e7f2778ec419047c564a6620b2d9bf7d6c64673411bf2ae13e628ee2b2c0b5b1
Version Created: July 11, 2025

Run on Replicate →