chenxwh/nova-t2v 🔢🖼️📝 → 🖼️

▶️ 39 runs 📅 Dec 2024 ⚙️ Cog 0.9.23 🔗 GitHub 📄 Paper ⚖️ License
image-to-video text-to-video

About

Autoregressive Video Generation without Vector Quantization

Example Output

Prompt:

"The camera slowly rotates around a massive stack of vintage televisions that are placed within a large New York museum gallery. Each of the televisions is showing a different program. There are 1950s sci-fi movies with their distinctive visuals, horror movies with their creepy scenes, news broadcasts with moving images and words, static on some screens, and a 1970s sitcom with its characteristic look. The televisions are of various sizes and designs, some with rounded edges and others with more angular shapes. The gallery is well-lit, with light falling on the stack of televisions and highlighting the different programs being shown. There are no people visible in the immediate vicinity, only the stack of televisions and the surrounding gallery space."

Output

Performance Metrics

123.48s Prediction Time
214.72s Total Time
All Input Parameters
{
  "fps": 12,
  "prompt": "The camera slowly rotates around a massive stack of vintage televisions that are placed within a large New York museum gallery. Each of the televisions is showing a different program. There are 1950s sci-fi movies with their distinctive visuals, horror movies with their creepy scenes, news broadcasts with moving images and words, static on some screens, and a 1970s sitcom with its characteristic look. The televisions are of various sizes and designs, some with rounded edges and others with more angular shapes. The gallery is well-lit, with light falling on the stack of televisions and highlighting the different programs being shown. There are no people visible in the immediate vicinity, only the stack of televisions and the surrounding gallery space.",
  "motion_flow": 5,
  "guidance_scale": 7,
  "negative_prompt": "low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand",
  "num_diffusion_steps": 100,
  "num_inference_steps": 128
}
Input Parameters
fps Type: integerDefault: 12
fps for the output video
seed Type: integer
Random seed. Leave blank to randomize the seed
image Type: string
Input image prompt, optional
prompt Type: stringDefault: The camera slowly rotates around a massive stack of vintage televisions that are placed within a large New York museum gallery. Each of the televisions is showing a different program. There are 1950s sci-fi movies with their distinctive visuals, horror movies with their creepy scenes, news broadcasts with moving images and words, static on some screens, and a 1970s sitcom with its characteristic look. The televisions are of various sizes and designs, some with rounded edges and others with more angular shapes. The gallery is well-lit, with light falling on the stack of televisions and highlighting the different programs being shown. There are no people visible in the immediate vicinity, only the stack of televisions and the surrounding gallery space.
Input prompt
motion_flow Type: integerDefault: 5Range: 1 - 10
Motion Flow
guidance_scale Type: numberDefault: 7Range: 1 - 10
Scale for classifier-free guidance
negative_prompt Type: stringDefault: low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand
Specify things to not see in the output
num_diffusion_steps Type: integerDefault: 100Range: 1 - 100
Number of diffusion steps
num_inference_steps Type: integerDefault: 128Range: 1 - 128
Number of inference steps
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using seed: 22625
  0%|          | 0/9 [00:00<?, ?it/s]
 11%|█         | 1/9 [00:13<01:47, 13.42s/it]
 22%|██▏       | 2/9 [00:26<01:33, 13.42s/it]
 33%|███▎      | 3/9 [00:40<01:20, 13.34s/it]
 44%|████▍     | 4/9 [00:53<01:06, 13.32s/it]
 56%|█████▌    | 5/9 [01:06<00:53, 13.31s/it]
 67%|██████▋   | 6/9 [01:20<00:39, 13.32s/it]
 78%|███████▊  | 7/9 [01:33<00:26, 13.34s/it]
 89%|████████▉ | 8/9 [01:46<00:13, 13.32s/it]
100%|██████████| 9/9 [01:59<00:00, 13.31s/it]
100%|██████████| 9/9 [01:59<00:00, 13.33s/it]
<class 'diffnext.pipelines.pipeline_nova.NOVAPipelineOutput'>
Version Details
Version ID
efe91027f017e9b32e1d458c59139dc5ab783955d111a2c72c8e7063e6f38261
Version Created
December 27, 2024
Run on Replicate →