wcarle/text2video-zero 🔢📝 → 🖼️
About
The Picsart Text2Video-Zero model leverages the power of existing text-to-image synthesis methods (e.g., Stable Diffusion), making them suitable for the video domain

Example Output
Prompt:
"a beautiful sunset, clouds"
Output
Performance Metrics
60.31s
Prediction Time
240.71s
Total Time
All Input Parameters
{ "t0": 44, "t1": 47, "fps": 8, "prompt": "a beautiful sunset, clouds", "chunk_size": 10, "resolution": 512, "video_length": 20, "motion_field_strength_x": 12, "motion_field_strength_y": 12 }
Input Parameters
- t0
- Timestep t0: Perform DDPM steps from t0 to t1. The larger the gap between t0 and t1, the more variance between the frames. Ensure t0 < t1
- t1
- Timestep t1: Perform DDPM steps from t0 to t1. The larger the gap between t0 and t1, the more variance between the frames. Ensure t0 < t1
- fps
- Frame rate for the video.
- seed
- Leave blank to randomize the seed.
- prompt
- Input prompt.
- chunk_size
- Chunk size: Number of frames processed at once. Reduce for lower memory usage.
- resolution
- Resolution of the video (square)
- video_length
- Number of frames in the video
- merging_ratio
- Ratio of how many tokens are merged. The higher the more compression (less memory and faster inference).
- negative_prompt
- Negative prompt.
- motion_field_strength_x
- Global Translation $\delta_{x}$
- motion_field_strength_y
- Global Translation $\delta_{y}$
Output Schema
Output
Example Execution Logs
Setting Random seed. Module Text2Video Model update Fetching 15 files: 0%| | 0/15 [00:00<?, ?it/s] Fetching 15 files: 100%|██████████| 15/15 [00:00<00:00, 32939.56it/s] /root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead. warnings.warn( You have disabled the safety checker for <class 'text_to_video_pipeline.TextToVideoPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . Processing chunk 1 / 3 t0 = 881 t1 = 941 Continue DDIM with i = 0, t = 981, latent = torch.Size([1, 4, 64, 64]), device = cuda:0, type = torch.float16 0%| | 0/50 [00:00<?, ?it/s] latent t1 found at i=1, t = 961 2%|▏ | 1/50 [00:01<01:18, 1.60s/it] latent t0 found at i = 4, t = 901 6%|▌ | 3/50 [00:01<00:22, 2.11it/s] 10%|█ | 5/50 [00:01<00:12, 3.73it/s] 14%|█▍ | 7/50 [00:02<00:07, 5.38it/s] 18%|█▊ | 9/50 [00:02<00:05, 6.97it/s] 22%|██▏ | 11/50 [00:02<00:04, 8.38it/s] 26%|██▌ | 13/50 [00:02<00:03, 9.60it/s] 30%|███ | 15/50 [00:02<00:03, 10.56it/s] 34%|███▍ | 17/50 [00:02<00:02, 11.34it/s] 38%|███▊ | 19/50 [00:02<00:02, 11.95it/s] 42%|████▏ | 21/50 [00:03<00:02, 12.39it/s] 46%|████▌ | 23/50 [00:03<00:02, 12.74it/s] 50%|█████ | 25/50 [00:03<00:01, 12.92it/s] 54%|█████▍ | 27/50 [00:03<00:01, 13.14it/s] 58%|█████▊ | 29/50 [00:03<00:01, 13.30it/s] 62%|██████▏ | 31/50 [00:03<00:01, 13.40it/s] 66%|██████▌ | 33/50 [00:03<00:01, 13.51it/s] 70%|███████ | 35/50 [00:04<00:01, 13.59it/s] 74%|███████▍ | 37/50 [00:04<00:00, 13.64it/s] 78%|███████▊ | 39/50 [00:04<00:00, 13.58it/s] 82%|████████▏ | 41/50 [00:04<00:00, 13.65it/s] 86%|████████▌ | 43/50 [00:04<00:00, 13.68it/s] 90%|█████████ | 45/50 [00:04<00:00, 13.71it/s] 94%|█████████▍| 47/50 [00:04<00:00, 13.69it/s] 98%|█████████▊| 49/50 [00:05<00:00, 13.67it/s] 100%|██████████| 50/50 [00:05<00:00, 9.59it/s] /root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/torch/nn/functional.py:4227: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details. warnings.warn( Continue DDIM with i = 2, t = 941, latent = torch.Size([10, 4, 64, 64]), device = cuda:0, type = torch.float16 0%| | 0/50 [00:00<?, ?it/s] 2%|▏ | 1/50 [00:00<00:19, 2.57it/s] 4%|▍ | 2/50 [00:00<00:17, 2.71it/s] 6%|▌ | 3/50 [00:01<00:17, 2.75it/s] 8%|▊ | 4/50 [00:01<00:16, 2.77it/s] 10%|█ | 5/50 [00:01<00:16, 2.78it/s] 12%|█▏ | 6/50 [00:02<00:15, 2.79it/s] 14%|█▍ | 7/50 [00:02<00:15, 2.79it/s] 16%|█▌ | 8/50 [00:02<00:15, 2.80it/s] 18%|█▊ | 9/50 [00:03<00:14, 2.80it/s] 20%|██ | 10/50 [00:03<00:14, 2.80it/s] 22%|██▏ | 11/50 [00:03<00:13, 2.80it/s] 24%|██▍ | 12/50 [00:04<00:13, 2.80it/s] 26%|██▌ | 13/50 [00:04<00:13, 2.80it/s] 28%|██▊ | 14/50 [00:05<00:12, 2.80it/s] 30%|███ | 15/50 [00:05<00:12, 2.80it/s] 32%|███▏ | 16/50 [00:05<00:12, 2.80it/s] 34%|███▍ | 17/50 [00:06<00:11, 2.79it/s] 36%|███▌ | 18/50 [00:06<00:11, 2.80it/s] 38%|███▊ | 19/50 [00:06<00:11, 2.80it/s] 40%|████ | 20/50 [00:07<00:10, 2.80it/s] 42%|████▏ | 21/50 [00:07<00:10, 2.80it/s] 44%|████▍ | 22/50 [00:07<00:09, 2.80it/s] 46%|████▌ | 23/50 [00:08<00:09, 2.80it/s] 48%|████▊ | 24/50 [00:08<00:09, 2.80it/s] 50%|█████ | 25/50 [00:08<00:08, 2.80it/s] 52%|█████▏ | 26/50 [00:09<00:08, 2.80it/s] 54%|█████▍ | 27/50 [00:09<00:08, 2.80it/s] 56%|█████▌ | 28/50 [00:10<00:07, 2.80it/s] 58%|█████▊ | 29/50 [00:10<00:07, 2.80it/s] 60%|██████ | 30/50 [00:10<00:07, 2.80it/s] 62%|██████▏ | 31/50 [00:11<00:06, 2.80it/s] 64%|██████▍ | 32/50 [00:11<00:06, 2.80it/s] 66%|██████▌ | 33/50 [00:11<00:06, 2.80it/s] 68%|██████▊ | 34/50 [00:12<00:05, 2.80it/s] 70%|███████ | 35/50 [00:12<00:05, 2.80it/s] 72%|███████▏ | 36/50 [00:12<00:04, 2.80it/s] 74%|███████▍ | 37/50 [00:13<00:04, 2.80it/s] 76%|███████▌ | 38/50 [00:13<00:04, 2.80it/s] 78%|███████▊ | 39/50 [00:13<00:03, 2.80it/s] 80%|████████ | 40/50 [00:14<00:03, 2.80it/s] 82%|████████▏ | 41/50 [00:14<00:03, 2.80it/s] 84%|████████▍ | 42/50 [00:15<00:02, 2.80it/s] 86%|████████▌ | 43/50 [00:15<00:02, 2.80it/s] 88%|████████▊ | 44/50 [00:15<00:02, 2.80it/s] 90%|█████████ | 45/50 [00:16<00:01, 2.80it/s] 92%|█████████▏| 46/50 [00:16<00:01, 2.80it/s] 94%|█████████▍| 47/50 [00:16<00:01, 2.80it/s] 96%|█████████▌| 48/50 [00:17<00:00, 2.80it/s] 96%|█████████▌| 48/50 [00:17<00:00, 2.80it/s] Processing chunk 2 / 3 t0 = 881 t1 = 941 Continue DDIM with i = 0, t = 981, latent = torch.Size([1, 4, 64, 64]), device = cuda:0, type = torch.float16 latent t1 found at i=1, t = 961 0%| | 0/50 [00:00<?, ?it/s] 4%|▍ | 2/50 [00:00<00:03, 13.44it/s] latent t0 found at i = 4, t = 901 8%|▊ | 4/50 [00:00<00:03, 13.46it/s] 12%|█▏ | 6/50 [00:00<00:03, 13.42it/s] 16%|█▌ | 8/50 [00:00<00:03, 13.48it/s] 20%|██ | 10/50 [00:00<00:02, 13.54it/s] 24%|██▍ | 12/50 [00:00<00:02, 13.62it/s] 28%|██▊ | 14/50 [00:01<00:02, 13.66it/s] 32%|███▏ | 16/50 [00:01<00:02, 13.70it/s] 36%|███▌ | 18/50 [00:01<00:02, 13.63it/s] 40%|████ | 20/50 [00:01<00:02, 13.68it/s] 44%|████▍ | 22/50 [00:01<00:02, 13.74it/s] 48%|████▊ | 24/50 [00:01<00:01, 13.77it/s] 52%|█████▏ | 26/50 [00:01<00:01, 13.76it/s] 56%|█████▌ | 28/50 [00:02<00:01, 13.77it/s] 60%|██████ | 30/50 [00:02<00:01, 13.77it/s] 64%|██████▍ | 32/50 [00:02<00:01, 13.70it/s] 68%|██████▊ | 34/50 [00:02<00:01, 13.68it/s] 72%|███████▏ | 36/50 [00:02<00:01, 13.71it/s] 76%|███████▌ | 38/50 [00:02<00:00, 13.72it/s] 80%|████████ | 40/50 [00:02<00:00, 13.73it/s] 84%|████████▍ | 42/50 [00:03<00:00, 13.74it/s] 88%|████████▊ | 44/50 [00:03<00:00, 13.74it/s] 92%|█████████▏| 46/50 [00:03<00:00, 13.62it/s] 96%|█████████▌| 48/50 [00:03<00:00, 13.65it/s] 100%|██████████| 50/50 [00:03<00:00, 13.64it/s] 100%|██████████| 50/50 [00:03<00:00, 13.67it/s] Continue DDIM with i = 2, t = 941, latent = torch.Size([10, 4, 64, 64]), device = cuda:0, type = torch.float16 0%| | 0/50 [00:00<?, ?it/s] 2%|▏ | 1/50 [00:00<00:38, 1.29it/s] 4%|▍ | 2/50 [00:01<00:25, 1.89it/s] 6%|▌ | 3/50 [00:01<00:21, 2.22it/s] 8%|▊ | 4/50 [00:01<00:19, 2.42it/s] 10%|█ | 5/50 [00:02<00:17, 2.54it/s] 12%|█▏ | 6/50 [00:02<00:16, 2.63it/s] 14%|█▍ | 7/50 [00:02<00:16, 2.68it/s] 16%|█▌ | 8/50 [00:03<00:15, 2.72it/s] 18%|█▊ | 9/50 [00:03<00:14, 2.75it/s] 20%|██ | 10/50 [00:03<00:14, 2.76it/s] 22%|██▏ | 11/50 [00:04<00:14, 2.78it/s] 24%|██▍ | 12/50 [00:04<00:13, 2.78it/s] 26%|██▌ | 13/50 [00:05<00:13, 2.79it/s] 28%|██▊ | 14/50 [00:05<00:12, 2.79it/s] 30%|███ | 15/50 [00:05<00:12, 2.80it/s] 32%|███▏ | 16/50 [00:06<00:12, 2.80it/s] 34%|███▍ | 17/50 [00:06<00:11, 2.80it/s] 36%|███▌ | 18/50 [00:06<00:11, 2.80it/s] 38%|███▊ | 19/50 [00:07<00:11, 2.80it/s] 40%|████ | 20/50 [00:07<00:10, 2.80it/s] 42%|████▏ | 21/50 [00:07<00:10, 2.80it/s] 44%|████▍ | 22/50 [00:08<00:09, 2.80it/s] 46%|████▌ | 23/50 [00:08<00:09, 2.80it/s] 48%|████▊ | 24/50 [00:08<00:09, 2.80it/s] 50%|█████ | 25/50 [00:09<00:08, 2.80it/s] 52%|█████▏ | 26/50 [00:09<00:08, 2.80it/s] 54%|█████▍ | 27/50 [00:10<00:08, 2.80it/s] 56%|█████▌ | 28/50 [00:10<00:07, 2.80it/s] 58%|█████▊ | 29/50 [00:10<00:07, 2.80it/s] 60%|██████ | 30/50 [00:11<00:07, 2.80it/s] 62%|██████▏ | 31/50 [00:11<00:06, 2.80it/s] 64%|██████▍ | 32/50 [00:11<00:06, 2.79it/s] 66%|██████▌ | 33/50 [00:12<00:06, 2.80it/s] 68%|██████▊ | 34/50 [00:12<00:05, 2.80it/s] 70%|███████ | 35/50 [00:12<00:05, 2.80it/s] 72%|███████▏ | 36/50 [00:13<00:04, 2.80it/s] 74%|███████▍ | 37/50 [00:13<00:04, 2.80it/s] 76%|███████▌ | 38/50 [00:13<00:04, 2.80it/s] 78%|███████▊ | 39/50 [00:14<00:03, 2.80it/s] 80%|████████ | 40/50 [00:14<00:03, 2.80it/s] 82%|████████▏ | 41/50 [00:15<00:03, 2.80it/s] 84%|████████▍ | 42/50 [00:15<00:02, 2.80it/s] 86%|████████▌ | 43/50 [00:15<00:02, 2.80it/s] 88%|████████▊ | 44/50 [00:16<00:02, 2.80it/s] 90%|█████████ | 45/50 [00:16<00:01, 2.80it/s] 92%|█████████▏| 46/50 [00:16<00:01, 2.80it/s] 94%|█████████▍| 47/50 [00:17<00:01, 2.80it/s] 96%|█████████▌| 48/50 [00:17<00:00, 2.80it/s] 96%|█████████▌| 48/50 [00:17<00:00, 2.73it/s] Processing chunk 3 / 3 t0 = 881 t1 = 941 Continue DDIM with i = 0, t = 981, latent = torch.Size([1, 4, 64, 64]), device = cuda:0, type = torch.float16 latent t1 found at i=1, t = 961 0%| | 0/50 [00:00<?, ?it/s] 4%|▍ | 2/50 [00:00<00:03, 13.45it/s] latent t0 found at i = 4, t = 901 8%|▊ | 4/50 [00:00<00:03, 13.41it/s] 12%|█▏ | 6/50 [00:00<00:03, 13.50it/s] 16%|█▌ | 8/50 [00:00<00:03, 13.55it/s] 20%|██ | 10/50 [00:00<00:02, 13.55it/s] 24%|██▍ | 12/50 [00:00<00:02, 13.60it/s] 28%|██▊ | 14/50 [00:01<00:02, 13.66it/s] 32%|███▏ | 16/50 [00:01<00:02, 13.58it/s] 36%|███▌ | 18/50 [00:01<00:02, 13.62it/s] 40%|████ | 20/50 [00:01<00:02, 13.66it/s] 44%|████▍ | 22/50 [00:01<00:02, 13.71it/s] 48%|████▊ | 24/50 [00:01<00:01, 13.66it/s] 52%|█████▏ | 26/50 [00:01<00:01, 13.68it/s] 56%|█████▌ | 28/50 [00:02<00:01, 13.65it/s] 60%|██████ | 30/50 [00:02<00:01, 13.55it/s] 64%|██████▍ | 32/50 [00:02<00:01, 13.48it/s] 68%|██████▊ | 34/50 [00:02<00:01, 13.55it/s] 72%|███████▏ | 36/50 [00:02<00:01, 13.63it/s] 76%|███████▌ | 38/50 [00:02<00:00, 13.70it/s] 80%|████████ | 40/50 [00:02<00:00, 13.71it/s] 84%|████████▍ | 42/50 [00:03<00:00, 13.67it/s] 88%|████████▊ | 44/50 [00:03<00:00, 13.52it/s] 92%|█████████▏| 46/50 [00:03<00:00, 13.60it/s] 96%|█████████▌| 48/50 [00:03<00:00, 13.64it/s] 100%|██████████| 50/50 [00:03<00:00, 13.68it/s] 100%|██████████| 50/50 [00:03<00:00, 13.61it/s] Continue DDIM with i = 2, t = 941, latent = torch.Size([3, 4, 64, 64]), device = cuda:0, type = torch.float16 0%| | 0/50 [00:00<?, ?it/s] 2%|▏ | 1/50 [00:00<00:08, 5.71it/s] 4%|▍ | 2/50 [00:00<00:07, 6.72it/s] 6%|▌ | 3/50 [00:00<00:06, 7.14it/s] 8%|▊ | 4/50 [00:00<00:06, 7.31it/s] 10%|█ | 5/50 [00:00<00:06, 7.42it/s] 12%|█▏ | 6/50 [00:00<00:05, 7.52it/s] 14%|█▍ | 7/50 [00:00<00:05, 7.58it/s] 16%|█▌ | 8/50 [00:01<00:05, 7.63it/s] 18%|█▊ | 9/50 [00:01<00:05, 7.65it/s] 20%|██ | 10/50 [00:01<00:05, 7.64it/s] 22%|██▏ | 11/50 [00:01<00:05, 7.63it/s] 24%|██▍ | 12/50 [00:01<00:04, 7.62it/s] 26%|██▌ | 13/50 [00:01<00:04, 7.63it/s] 28%|██▊ | 14/50 [00:01<00:04, 7.64it/s] 30%|███ | 15/50 [00:02<00:04, 7.66it/s] 32%|███▏ | 16/50 [00:02<00:04, 7.68it/s] 34%|███▍ | 17/50 [00:02<00:04, 7.68it/s] 36%|███▌ | 18/50 [00:02<00:04, 7.68it/s] 38%|███▊ | 19/50 [00:02<00:04, 7.65it/s] 40%|████ | 20/50 [00:02<00:03, 7.64it/s] 42%|████▏ | 21/50 [00:02<00:03, 7.66it/s] 44%|████▍ | 22/50 [00:02<00:03, 7.67it/s] 46%|████▌ | 23/50 [00:03<00:03, 7.69it/s] 48%|████▊ | 24/50 [00:03<00:03, 7.69it/s] 50%|█████ | 25/50 [00:03<00:03, 7.67it/s] 52%|█████▏ | 26/50 [00:03<00:03, 7.69it/s] 54%|█████▍ | 27/50 [00:03<00:02, 7.69it/s] 56%|█████▌ | 28/50 [00:03<00:02, 7.71it/s] 58%|█████▊ | 29/50 [00:03<00:02, 7.72it/s] 60%|██████ | 30/50 [00:03<00:02, 7.73it/s] 62%|██████▏ | 31/50 [00:04<00:02, 7.74it/s] 64%|██████▍ | 32/50 [00:04<00:02, 7.74it/s] 66%|██████▌ | 33/50 [00:04<00:02, 7.73it/s] 68%|██████▊ | 34/50 [00:04<00:02, 7.72it/s] 70%|███████ | 35/50 [00:04<00:01, 7.70it/s] 72%|███████▏ | 36/50 [00:04<00:01, 7.70it/s] 74%|███████▍ | 37/50 [00:04<00:01, 7.70it/s] 76%|███████▌ | 38/50 [00:04<00:01, 7.69it/s] 78%|███████▊ | 39/50 [00:05<00:01, 7.67it/s] 80%|████████ | 40/50 [00:05<00:01, 7.68it/s] 82%|████████▏ | 41/50 [00:05<00:01, 7.69it/s] 84%|████████▍ | 42/50 [00:05<00:01, 7.68it/s] 86%|████████▌ | 43/50 [00:05<00:00, 7.68it/s] 88%|████████▊ | 44/50 [00:05<00:00, 7.69it/s] 90%|█████████ | 45/50 [00:05<00:00, 7.69it/s] 92%|█████████▏| 46/50 [00:06<00:00, 7.69it/s] 94%|█████████▍| 47/50 [00:06<00:00, 7.70it/s] 96%|█████████▌| 48/50 [00:06<00:00, 7.69it/s] 96%|█████████▌| 48/50 [00:06<00:00, 7.63it/s]
Version Details
- Version ID
41f6928e5932de07e2b8c3b8c89feed58c3e3827e8a52567473d477fb36d2f25
- Version Created
- April 11, 2023