wcarle/text2video-zero 🔢📝 → 🖼️

▶️ 2.2K runs 📅 Apr 2023 ⚙️ Cog 0.6.1 🔗 GitHub 📄 Paper ⚖️ License
text-to-video

About

The Picsart Text2Video-Zero model leverages the power of existing text-to-image synthesis methods (e.g., Stable Diffusion), making them suitable for the video domain

Example Output

Prompt:

"a beautiful sunset, clouds"

Output

Performance Metrics

60.31s Prediction Time
240.71s Total Time
All Input Parameters
{
  "t0": 44,
  "t1": 47,
  "fps": 8,
  "prompt": "a beautiful sunset, clouds",
  "chunk_size": 10,
  "resolution": 512,
  "video_length": 20,
  "motion_field_strength_x": 12,
  "motion_field_strength_y": 12
}
Input Parameters
t0 Type: integerDefault: 44Range: 1 - 50
Timestep t0: Perform DDPM steps from t0 to t1. The larger the gap between t0 and t1, the more variance between the frames. Ensure t0 < t1
t1 Type: integerDefault: 47Range: 1 - 50
Timestep t1: Perform DDPM steps from t0 to t1. The larger the gap between t0 and t1, the more variance between the frames. Ensure t0 < t1
fps Type: integerDefault: 15Range: 5 - 60
Frame rate for the video.
seed Type: string
Leave blank to randomize the seed.
prompt Type: stringDefault: a cat
Input prompt.
chunk_size Type: integerDefault: 8Range: 1 - 10
Chunk size: Number of frames processed at once. Reduce for lower memory usage.
resolution Type: integerDefault: 512
Resolution of the video (square)
video_length Type: integerDefault: 8
Number of frames in the video
merging_ratio Type: numberDefault: 0Range: 0 - 0.9
Ratio of how many tokens are merged. The higher the more compression (less memory and faster inference).
negative_prompt Type: stringDefault:
Negative prompt.
motion_field_strength_x Type: integerDefault: 12Range: -20 - 20
Global Translation $\delta_{x}$
motion_field_strength_y Type: integerDefault: 12Range: -20 - 20
Global Translation $\delta_{y}$
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Setting Random seed.
Module Text2Video
Model update
Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]
Fetching 15 files: 100%|██████████| 15/15 [00:00<00:00, 32939.56it/s]
/root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(
You have disabled the safety checker for <class 'text_to_video_pipeline.TextToVideoPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Processing chunk 1 / 3
t0 = 881 t1 = 941
Continue DDIM with i = 0, t = 981, latent = torch.Size([1, 4, 64, 64]), device = cuda:0, type = torch.float16
  0%|          | 0/50 [00:00<?, ?it/s]
latent t1 found at i=1, t = 961
  2%|▏         | 1/50 [00:01<01:18,  1.60s/it]
latent t0 found at i = 4, t = 901
  6%|▌         | 3/50 [00:01<00:22,  2.11it/s]
 10%|█         | 5/50 [00:01<00:12,  3.73it/s]
 14%|█▍        | 7/50 [00:02<00:07,  5.38it/s]
 18%|█▊        | 9/50 [00:02<00:05,  6.97it/s]
 22%|██▏       | 11/50 [00:02<00:04,  8.38it/s]
 26%|██▌       | 13/50 [00:02<00:03,  9.60it/s]
 30%|███       | 15/50 [00:02<00:03, 10.56it/s]
 34%|███▍      | 17/50 [00:02<00:02, 11.34it/s]
 38%|███▊      | 19/50 [00:02<00:02, 11.95it/s]
 42%|████▏     | 21/50 [00:03<00:02, 12.39it/s]
 46%|████▌     | 23/50 [00:03<00:02, 12.74it/s]
 50%|█████     | 25/50 [00:03<00:01, 12.92it/s]
 54%|█████▍    | 27/50 [00:03<00:01, 13.14it/s]
 58%|█████▊    | 29/50 [00:03<00:01, 13.30it/s]
 62%|██████▏   | 31/50 [00:03<00:01, 13.40it/s]
 66%|██████▌   | 33/50 [00:03<00:01, 13.51it/s]
 70%|███████   | 35/50 [00:04<00:01, 13.59it/s]
 74%|███████▍  | 37/50 [00:04<00:00, 13.64it/s]
 78%|███████▊  | 39/50 [00:04<00:00, 13.58it/s]
 82%|████████▏ | 41/50 [00:04<00:00, 13.65it/s]
 86%|████████▌ | 43/50 [00:04<00:00, 13.68it/s]
 90%|█████████ | 45/50 [00:04<00:00, 13.71it/s]
 94%|█████████▍| 47/50 [00:04<00:00, 13.69it/s]
 98%|█████████▊| 49/50 [00:05<00:00, 13.67it/s]
100%|██████████| 50/50 [00:05<00:00,  9.59it/s]
/root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/torch/nn/functional.py:4227: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
warnings.warn(
Continue DDIM with i = 2, t = 941, latent = torch.Size([10, 4, 64, 64]), device = cuda:0, type = torch.float16
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:00<00:19,  2.57it/s]
  4%|▍         | 2/50 [00:00<00:17,  2.71it/s]
  6%|▌         | 3/50 [00:01<00:17,  2.75it/s]
  8%|▊         | 4/50 [00:01<00:16,  2.77it/s]
 10%|█         | 5/50 [00:01<00:16,  2.78it/s]
 12%|█▏        | 6/50 [00:02<00:15,  2.79it/s]
 14%|█▍        | 7/50 [00:02<00:15,  2.79it/s]
 16%|█▌        | 8/50 [00:02<00:15,  2.80it/s]
 18%|█▊        | 9/50 [00:03<00:14,  2.80it/s]
 20%|██        | 10/50 [00:03<00:14,  2.80it/s]
 22%|██▏       | 11/50 [00:03<00:13,  2.80it/s]
 24%|██▍       | 12/50 [00:04<00:13,  2.80it/s]
 26%|██▌       | 13/50 [00:04<00:13,  2.80it/s]
 28%|██▊       | 14/50 [00:05<00:12,  2.80it/s]
 30%|███       | 15/50 [00:05<00:12,  2.80it/s]
 32%|███▏      | 16/50 [00:05<00:12,  2.80it/s]
 34%|███▍      | 17/50 [00:06<00:11,  2.79it/s]
 36%|███▌      | 18/50 [00:06<00:11,  2.80it/s]
 38%|███▊      | 19/50 [00:06<00:11,  2.80it/s]
 40%|████      | 20/50 [00:07<00:10,  2.80it/s]
 42%|████▏     | 21/50 [00:07<00:10,  2.80it/s]
 44%|████▍     | 22/50 [00:07<00:09,  2.80it/s]
 46%|████▌     | 23/50 [00:08<00:09,  2.80it/s]
 48%|████▊     | 24/50 [00:08<00:09,  2.80it/s]
 50%|█████     | 25/50 [00:08<00:08,  2.80it/s]
 52%|█████▏    | 26/50 [00:09<00:08,  2.80it/s]
 54%|█████▍    | 27/50 [00:09<00:08,  2.80it/s]
 56%|█████▌    | 28/50 [00:10<00:07,  2.80it/s]
 58%|█████▊    | 29/50 [00:10<00:07,  2.80it/s]
 60%|██████    | 30/50 [00:10<00:07,  2.80it/s]
 62%|██████▏   | 31/50 [00:11<00:06,  2.80it/s]
 64%|██████▍   | 32/50 [00:11<00:06,  2.80it/s]
 66%|██████▌   | 33/50 [00:11<00:06,  2.80it/s]
 68%|██████▊   | 34/50 [00:12<00:05,  2.80it/s]
 70%|███████   | 35/50 [00:12<00:05,  2.80it/s]
 72%|███████▏  | 36/50 [00:12<00:04,  2.80it/s]
 74%|███████▍  | 37/50 [00:13<00:04,  2.80it/s]
 76%|███████▌  | 38/50 [00:13<00:04,  2.80it/s]
 78%|███████▊  | 39/50 [00:13<00:03,  2.80it/s]
 80%|████████  | 40/50 [00:14<00:03,  2.80it/s]
 82%|████████▏ | 41/50 [00:14<00:03,  2.80it/s]
 84%|████████▍ | 42/50 [00:15<00:02,  2.80it/s]
 86%|████████▌ | 43/50 [00:15<00:02,  2.80it/s]
 88%|████████▊ | 44/50 [00:15<00:02,  2.80it/s]
 90%|█████████ | 45/50 [00:16<00:01,  2.80it/s]
 92%|█████████▏| 46/50 [00:16<00:01,  2.80it/s]
 94%|█████████▍| 47/50 [00:16<00:01,  2.80it/s]
96%|█████████▌| 48/50 [00:17<00:00,  2.80it/s]
96%|█████████▌| 48/50 [00:17<00:00,  2.80it/s]
Processing chunk 2 / 3
t0 = 881 t1 = 941
Continue DDIM with i = 0, t = 981, latent = torch.Size([1, 4, 64, 64]), device = cuda:0, type = torch.float16
latent t1 found at i=1, t = 961
  0%|          | 0/50 [00:00<?, ?it/s]
  4%|▍         | 2/50 [00:00<00:03, 13.44it/s]
latent t0 found at i = 4, t = 901
  8%|▊         | 4/50 [00:00<00:03, 13.46it/s]
 12%|█▏        | 6/50 [00:00<00:03, 13.42it/s]
 16%|█▌        | 8/50 [00:00<00:03, 13.48it/s]
 20%|██        | 10/50 [00:00<00:02, 13.54it/s]
 24%|██▍       | 12/50 [00:00<00:02, 13.62it/s]
 28%|██▊       | 14/50 [00:01<00:02, 13.66it/s]
 32%|███▏      | 16/50 [00:01<00:02, 13.70it/s]
 36%|███▌      | 18/50 [00:01<00:02, 13.63it/s]
 40%|████      | 20/50 [00:01<00:02, 13.68it/s]
 44%|████▍     | 22/50 [00:01<00:02, 13.74it/s]
 48%|████▊     | 24/50 [00:01<00:01, 13.77it/s]
 52%|█████▏    | 26/50 [00:01<00:01, 13.76it/s]
 56%|█████▌    | 28/50 [00:02<00:01, 13.77it/s]
 60%|██████    | 30/50 [00:02<00:01, 13.77it/s]
 64%|██████▍   | 32/50 [00:02<00:01, 13.70it/s]
 68%|██████▊   | 34/50 [00:02<00:01, 13.68it/s]
 72%|███████▏  | 36/50 [00:02<00:01, 13.71it/s]
 76%|███████▌  | 38/50 [00:02<00:00, 13.72it/s]
 80%|████████  | 40/50 [00:02<00:00, 13.73it/s]
 84%|████████▍ | 42/50 [00:03<00:00, 13.74it/s]
 88%|████████▊ | 44/50 [00:03<00:00, 13.74it/s]
 92%|█████████▏| 46/50 [00:03<00:00, 13.62it/s]
 96%|█████████▌| 48/50 [00:03<00:00, 13.65it/s]
100%|██████████| 50/50 [00:03<00:00, 13.64it/s]
100%|██████████| 50/50 [00:03<00:00, 13.67it/s]
Continue DDIM with i = 2, t = 941, latent = torch.Size([10, 4, 64, 64]), device = cuda:0, type = torch.float16
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:00<00:38,  1.29it/s]
  4%|▍         | 2/50 [00:01<00:25,  1.89it/s]
  6%|▌         | 3/50 [00:01<00:21,  2.22it/s]
  8%|▊         | 4/50 [00:01<00:19,  2.42it/s]
 10%|█         | 5/50 [00:02<00:17,  2.54it/s]
 12%|█▏        | 6/50 [00:02<00:16,  2.63it/s]
 14%|█▍        | 7/50 [00:02<00:16,  2.68it/s]
 16%|█▌        | 8/50 [00:03<00:15,  2.72it/s]
 18%|█▊        | 9/50 [00:03<00:14,  2.75it/s]
 20%|██        | 10/50 [00:03<00:14,  2.76it/s]
 22%|██▏       | 11/50 [00:04<00:14,  2.78it/s]
 24%|██▍       | 12/50 [00:04<00:13,  2.78it/s]
 26%|██▌       | 13/50 [00:05<00:13,  2.79it/s]
 28%|██▊       | 14/50 [00:05<00:12,  2.79it/s]
 30%|███       | 15/50 [00:05<00:12,  2.80it/s]
 32%|███▏      | 16/50 [00:06<00:12,  2.80it/s]
 34%|███▍      | 17/50 [00:06<00:11,  2.80it/s]
 36%|███▌      | 18/50 [00:06<00:11,  2.80it/s]
 38%|███▊      | 19/50 [00:07<00:11,  2.80it/s]
 40%|████      | 20/50 [00:07<00:10,  2.80it/s]
 42%|████▏     | 21/50 [00:07<00:10,  2.80it/s]
 44%|████▍     | 22/50 [00:08<00:09,  2.80it/s]
 46%|████▌     | 23/50 [00:08<00:09,  2.80it/s]
 48%|████▊     | 24/50 [00:08<00:09,  2.80it/s]
 50%|█████     | 25/50 [00:09<00:08,  2.80it/s]
 52%|█████▏    | 26/50 [00:09<00:08,  2.80it/s]
 54%|█████▍    | 27/50 [00:10<00:08,  2.80it/s]
 56%|█████▌    | 28/50 [00:10<00:07,  2.80it/s]
 58%|█████▊    | 29/50 [00:10<00:07,  2.80it/s]
 60%|██████    | 30/50 [00:11<00:07,  2.80it/s]
 62%|██████▏   | 31/50 [00:11<00:06,  2.80it/s]
 64%|██████▍   | 32/50 [00:11<00:06,  2.79it/s]
 66%|██████▌   | 33/50 [00:12<00:06,  2.80it/s]
 68%|██████▊   | 34/50 [00:12<00:05,  2.80it/s]
 70%|███████   | 35/50 [00:12<00:05,  2.80it/s]
 72%|███████▏  | 36/50 [00:13<00:04,  2.80it/s]
 74%|███████▍  | 37/50 [00:13<00:04,  2.80it/s]
 76%|███████▌  | 38/50 [00:13<00:04,  2.80it/s]
 78%|███████▊  | 39/50 [00:14<00:03,  2.80it/s]
 80%|████████  | 40/50 [00:14<00:03,  2.80it/s]
 82%|████████▏ | 41/50 [00:15<00:03,  2.80it/s]
 84%|████████▍ | 42/50 [00:15<00:02,  2.80it/s]
 86%|████████▌ | 43/50 [00:15<00:02,  2.80it/s]
 88%|████████▊ | 44/50 [00:16<00:02,  2.80it/s]
 90%|█████████ | 45/50 [00:16<00:01,  2.80it/s]
 92%|█████████▏| 46/50 [00:16<00:01,  2.80it/s]
 94%|█████████▍| 47/50 [00:17<00:01,  2.80it/s]
 96%|█████████▌| 48/50 [00:17<00:00,  2.80it/s]
96%|█████████▌| 48/50 [00:17<00:00,  2.73it/s]
Processing chunk 3 / 3
t0 = 881 t1 = 941
Continue DDIM with i = 0, t = 981, latent = torch.Size([1, 4, 64, 64]), device = cuda:0, type = torch.float16
latent t1 found at i=1, t = 961
  0%|          | 0/50 [00:00<?, ?it/s]
  4%|▍         | 2/50 [00:00<00:03, 13.45it/s]
latent t0 found at i = 4, t = 901
  8%|▊         | 4/50 [00:00<00:03, 13.41it/s]
 12%|█▏        | 6/50 [00:00<00:03, 13.50it/s]
 16%|█▌        | 8/50 [00:00<00:03, 13.55it/s]
 20%|██        | 10/50 [00:00<00:02, 13.55it/s]
 24%|██▍       | 12/50 [00:00<00:02, 13.60it/s]
 28%|██▊       | 14/50 [00:01<00:02, 13.66it/s]
 32%|███▏      | 16/50 [00:01<00:02, 13.58it/s]
 36%|███▌      | 18/50 [00:01<00:02, 13.62it/s]
 40%|████      | 20/50 [00:01<00:02, 13.66it/s]
 44%|████▍     | 22/50 [00:01<00:02, 13.71it/s]
 48%|████▊     | 24/50 [00:01<00:01, 13.66it/s]
 52%|█████▏    | 26/50 [00:01<00:01, 13.68it/s]
 56%|█████▌    | 28/50 [00:02<00:01, 13.65it/s]
 60%|██████    | 30/50 [00:02<00:01, 13.55it/s]
 64%|██████▍   | 32/50 [00:02<00:01, 13.48it/s]
 68%|██████▊   | 34/50 [00:02<00:01, 13.55it/s]
 72%|███████▏  | 36/50 [00:02<00:01, 13.63it/s]
 76%|███████▌  | 38/50 [00:02<00:00, 13.70it/s]
 80%|████████  | 40/50 [00:02<00:00, 13.71it/s]
 84%|████████▍ | 42/50 [00:03<00:00, 13.67it/s]
 88%|████████▊ | 44/50 [00:03<00:00, 13.52it/s]
 92%|█████████▏| 46/50 [00:03<00:00, 13.60it/s]
 96%|█████████▌| 48/50 [00:03<00:00, 13.64it/s]
100%|██████████| 50/50 [00:03<00:00, 13.68it/s]
100%|██████████| 50/50 [00:03<00:00, 13.61it/s]
Continue DDIM with i = 2, t = 941, latent = torch.Size([3, 4, 64, 64]), device = cuda:0, type = torch.float16
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:00<00:08,  5.71it/s]
  4%|▍         | 2/50 [00:00<00:07,  6.72it/s]
  6%|▌         | 3/50 [00:00<00:06,  7.14it/s]
  8%|▊         | 4/50 [00:00<00:06,  7.31it/s]
 10%|█         | 5/50 [00:00<00:06,  7.42it/s]
 12%|█▏        | 6/50 [00:00<00:05,  7.52it/s]
 14%|█▍        | 7/50 [00:00<00:05,  7.58it/s]
 16%|█▌        | 8/50 [00:01<00:05,  7.63it/s]
 18%|█▊        | 9/50 [00:01<00:05,  7.65it/s]
 20%|██        | 10/50 [00:01<00:05,  7.64it/s]
 22%|██▏       | 11/50 [00:01<00:05,  7.63it/s]
 24%|██▍       | 12/50 [00:01<00:04,  7.62it/s]
 26%|██▌       | 13/50 [00:01<00:04,  7.63it/s]
 28%|██▊       | 14/50 [00:01<00:04,  7.64it/s]
 30%|███       | 15/50 [00:02<00:04,  7.66it/s]
 32%|███▏      | 16/50 [00:02<00:04,  7.68it/s]
 34%|███▍      | 17/50 [00:02<00:04,  7.68it/s]
 36%|███▌      | 18/50 [00:02<00:04,  7.68it/s]
 38%|███▊      | 19/50 [00:02<00:04,  7.65it/s]
 40%|████      | 20/50 [00:02<00:03,  7.64it/s]
 42%|████▏     | 21/50 [00:02<00:03,  7.66it/s]
 44%|████▍     | 22/50 [00:02<00:03,  7.67it/s]
 46%|████▌     | 23/50 [00:03<00:03,  7.69it/s]
 48%|████▊     | 24/50 [00:03<00:03,  7.69it/s]
 50%|█████     | 25/50 [00:03<00:03,  7.67it/s]
 52%|█████▏    | 26/50 [00:03<00:03,  7.69it/s]
 54%|█████▍    | 27/50 [00:03<00:02,  7.69it/s]
 56%|█████▌    | 28/50 [00:03<00:02,  7.71it/s]
 58%|█████▊    | 29/50 [00:03<00:02,  7.72it/s]
 60%|██████    | 30/50 [00:03<00:02,  7.73it/s]
 62%|██████▏   | 31/50 [00:04<00:02,  7.74it/s]
 64%|██████▍   | 32/50 [00:04<00:02,  7.74it/s]
 66%|██████▌   | 33/50 [00:04<00:02,  7.73it/s]
 68%|██████▊   | 34/50 [00:04<00:02,  7.72it/s]
 70%|███████   | 35/50 [00:04<00:01,  7.70it/s]
 72%|███████▏  | 36/50 [00:04<00:01,  7.70it/s]
 74%|███████▍  | 37/50 [00:04<00:01,  7.70it/s]
 76%|███████▌  | 38/50 [00:04<00:01,  7.69it/s]
 78%|███████▊  | 39/50 [00:05<00:01,  7.67it/s]
 80%|████████  | 40/50 [00:05<00:01,  7.68it/s]
 82%|████████▏ | 41/50 [00:05<00:01,  7.69it/s]
 84%|████████▍ | 42/50 [00:05<00:01,  7.68it/s]
 86%|████████▌ | 43/50 [00:05<00:00,  7.68it/s]
 88%|████████▊ | 44/50 [00:05<00:00,  7.69it/s]
 90%|█████████ | 45/50 [00:05<00:00,  7.69it/s]
 92%|█████████▏| 46/50 [00:06<00:00,  7.69it/s]
 94%|█████████▍| 47/50 [00:06<00:00,  7.70it/s]
96%|█████████▌| 48/50 [00:06<00:00,  7.69it/s]
96%|█████████▌| 48/50 [00:06<00:00,  7.63it/s]
Version Details
Version ID
41f6928e5932de07e2b8c3b8c89feed58c3e3827e8a52567473d477fb36d2f25
Version Created
April 11, 2023
Run on Replicate →