chenxwh/nova-t2v 🔢🖼️📝 → 🖼️

▶️ 40 runs 📅 Dec 2024 ⚙️ Cog 0.9.23 🔗 GitHub 📄 Paper ⚖️ License

image-to-video text-to-video

Performance

123.5sTypical run time

40Total runs

About

Autoregressive Video Generation without Vector Quantization

Example Output

Prompt:

"The camera slowly rotates around a massive stack of vintage televisions that are placed within a large New York museum gallery. Each of the televisions is showing a different program. There are 1950s sci-fi movies with their distinctive visuals, horror movies with their creepy scenes, news broadcasts with moving images and words, static on some screens, and a 1970s sitcom with its characteristic look. The televisions are of various sizes and designs, some with rounded edges and others with more angular shapes. The gallery is well-lit, with light falling on the stack of televisions and highlighting the different programs being shown. There are no people visible in the immediate vicinity, only the stack of televisions and the surrounding gallery space."

Output

Performance Metrics

123.48s Prediction Time

214.72s Total Time

All Input Parameters

{
  "fps": 12,
  "prompt": "The camera slowly rotates around a massive stack of vintage televisions that are placed within a large New York museum gallery. Each of the televisions is showing a different program. There are 1950s sci-fi movies with their distinctive visuals, horror movies with their creepy scenes, news broadcasts with moving images and words, static on some screens, and a 1970s sitcom with its characteristic look. The televisions are of various sizes and designs, some with rounded edges and others with more angular shapes. The gallery is well-lit, with light falling on the stack of televisions and highlighting the different programs being shown. There are no people visible in the immediate vicinity, only the stack of televisions and the surrounding gallery space.",
  "motion_flow": 5,
  "guidance_scale": 7,
  "negative_prompt": "low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand",
  "num_diffusion_steps": 100,
  "num_inference_steps": 128
}

Input Parameters

fps Type: integerDefault: 12: fps for the output video
seed Type: integer: Random seed. Leave blank to randomize the seed
image Type: string: Input image prompt, optional
prompt Type: stringDefault: The camera slowly rotates around a massive stack of vintage televisions that are placed within a large New York museum gallery. Each of the televisions is showing a different program. There are 1950s sci-fi movies with their distinctive visuals, horror movies with their creepy scenes, news broadcasts with moving images and words, static on some screens, and a 1970s sitcom with its characteristic look. The televisions are of various sizes and designs, some with rounded edges and others with more angular shapes. The gallery is well-lit, with light falling on the stack of televisions and highlighting the different programs being shown. There are no people visible in the immediate vicinity, only the stack of televisions and the surrounding gallery space.: Input prompt
motion_flow Type: integerDefault: 5Range: 1 - 10: Motion Flow
guidance_scale Type: numberDefault: 7Range: 1 - 10: Scale for classifier-free guidance
negative_prompt Type: stringDefault: low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand: Specify things to not see in the output
num_diffusion_steps Type: integerDefault: 100Range: 1 - 100: Number of diffusion steps
num_inference_steps Type: integerDefault: 128Range: 1 - 128: Number of inference steps

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using seed: 22625
  0%|          | 0/9 [00:00<?, ?it/s]
 11%|█         | 1/9 [00:13<01:47, 13.42s/it]
 22%|██▏       | 2/9 [00:26<01:33, 13.42s/it]
 33%|███▎      | 3/9 [00:40<01:20, 13.34s/it]
 44%|████▍     | 4/9 [00:53<01:06, 13.32s/it]
 56%|█████▌    | 5/9 [01:06<00:53, 13.31s/it]
 67%|██████▋   | 6/9 [01:20<00:39, 13.32s/it]
 78%|███████▊  | 7/9 [01:33<00:26, 13.34s/it]
 89%|████████▉ | 8/9 [01:46<00:13, 13.32s/it]
100%|██████████| 9/9 [01:59<00:00, 13.31s/it]
100%|██████████| 9/9 [01:59<00:00, 13.33s/it]
<class 'diffnext.pipelines.pipeline_nova.NOVAPipelineOutput'>

Version Details

Version ID: efe91027f017e9b32e1d458c59139dc5ab783955d111a2c72c8e7063e6f38261
Version Created: December 27, 2024

Run on Replicate →