adymaharana/story-dalle 🔢❓📝✓ → 🖼️

▶️ 667 runs 📅 Nov 2022 ⚙️ Cog 0.4.4 🔗 GitHub 📄 Paper ⚖️ License
cartoon story-visualization text-to-image

About

A model trained for the task of story visualization; generating images to pair with captions in a story.

Example Output

Output

Example output

Performance Metrics

17.65s Prediction Time
17.69s Total Time
All Input Parameters
{
  "top_k": 32,
  "top_p": 0.2,
  "source": "Pororo",
  "caption_1": "Pororo is in a party.",
  "caption_2": "Pororo is singing a song on the stage in the party",
  "caption_3": "Poby is cheering in the audience",
  "caption_4": "Crong is dancing in the party",
  "n_candidates": 4
}
Input Parameters
top_k Type: integerDefault: 32
the number of highest probability vocabulary tokens to keep for top-k-filtering
top_p Type: numberDefault: 0.2
Only the most probable tokens with probabilities that add up to `top_p` or higher are kept for generation
source Default: Pororo
The main character of your story
caption_1 Type: stringDefault: Pororo is in a party.
First scene in your story
caption_2 Type: stringDefault: Pororo is singing a song on the stage in the party
Second scene in your story
caption_3 Type: stringDefault: Poby is cheering in the audience
Third scene in your story
caption_4 Type: stringDefault: Crong is dancing in the party
Final scene in your story
n_candidates Type: integerDefault: 4Range: 1 - 4
Num candidates to generate for each story panel
supercondition Type: booleanDefault: false
Set `supercondition` to True to enable generation using a null hypothesis.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
['Pororo is in a party.', 'Pororo is singing a song on the stage in the party.', 'Poby is cheering in the audience.', 'Crong is dancing in the party.'] [1, 1, 1, 1] Pororo 4
Pororo is in a party.
Pororo is singing a song on the stage in the party.
Poby is cheering in the audience.
Crong is dancing in the party.
['Pororo is in a party.', 'Pororo is singing a song on the stage in a party.', 'Poby is cheering in the audience.', 'Crong is dancing in a party.']
torch.Size([16, 64]) torch.Size([16, 256]) torch.Size([16, 1, 1536]) torch.Size([16, 1])
  0%|          | 0/256 [00:00<?, ?it/s]
  1%|          | 2/256 [00:00<00:17, 14.21it/s]
  2%|▏         | 4/256 [00:00<00:16, 15.75it/s]
  2%|▏         | 6/256 [00:00<00:15, 16.16it/s]
  3%|▎         | 8/256 [00:00<00:15, 16.34it/s]
  4%|▍         | 10/256 [00:00<00:15, 16.36it/s]
  5%|▍         | 12/256 [00:00<00:15, 16.21it/s]
  5%|▌         | 14/256 [00:00<00:14, 16.21it/s]
  6%|▋         | 16/256 [00:00<00:14, 16.28it/s]
  7%|▋         | 18/256 [00:01<00:14, 16.45it/s]
  8%|▊         | 20/256 [00:01<00:14, 16.54it/s]
  9%|▊         | 22/256 [00:01<00:14, 16.58it/s]
  9%|▉         | 24/256 [00:01<00:14, 16.53it/s]
 10%|█         | 26/256 [00:01<00:13, 16.51it/s]
 11%|█         | 28/256 [00:01<00:13, 16.35it/s]
 12%|█▏        | 30/256 [00:01<00:13, 16.29it/s]
 12%|█▎        | 32/256 [00:01<00:13, 16.41it/s]
 13%|█▎        | 34/256 [00:02<00:13, 16.49it/s]
 14%|█▍        | 36/256 [00:02<00:13, 16.20it/s]
 15%|█▍        | 38/256 [00:02<00:13, 16.33it/s]
 16%|█▌        | 40/256 [00:02<00:13, 16.43it/s]
 16%|█▋        | 42/256 [00:02<00:13, 16.35it/s]
 17%|█▋        | 44/256 [00:02<00:12, 16.39it/s]
 18%|█▊        | 46/256 [00:02<00:12, 16.33it/s]
 19%|█▉        | 48/256 [00:02<00:12, 16.34it/s]
 20%|█▉        | 50/256 [00:03<00:12, 16.45it/s]
 20%|██        | 52/256 [00:03<00:12, 16.25it/s]
 21%|██        | 54/256 [00:03<00:12, 16.40it/s]
 22%|██▏       | 56/256 [00:03<00:12, 16.50it/s]
 23%|██▎       | 58/256 [00:03<00:11, 16.50it/s]
 23%|██▎       | 60/256 [00:03<00:11, 16.57it/s]
 24%|██▍       | 62/256 [00:03<00:11, 16.62it/s]
 25%|██▌       | 64/256 [00:03<00:11, 16.45it/s]
 26%|██▌       | 66/256 [00:04<00:11, 16.48it/s]
 27%|██▋       | 68/256 [00:04<00:11, 16.41it/s]
 27%|██▋       | 70/256 [00:04<00:11, 16.28it/s]
 28%|██▊       | 72/256 [00:04<00:11, 16.13it/s]
 29%|██▉       | 74/256 [00:04<00:11, 16.18it/s]
 30%|██▉       | 76/256 [00:04<00:11, 16.03it/s]
 30%|███       | 78/256 [00:04<00:11, 16.15it/s]
 31%|███▏      | 80/256 [00:04<00:10, 16.15it/s]
 32%|███▏      | 82/256 [00:05<00:10, 16.21it/s]
 33%|███▎      | 84/256 [00:05<00:10, 16.27it/s]
 34%|███▎      | 86/256 [00:05<00:10, 16.26it/s]
 34%|███▍      | 88/256 [00:05<00:10, 16.40it/s]
 35%|███▌      | 90/256 [00:05<00:10, 16.23it/s]
 36%|███▌      | 92/256 [00:05<00:10, 15.97it/s]
 37%|███▋      | 94/256 [00:05<00:10, 16.16it/s]
 38%|███▊      | 96/256 [00:05<00:09, 16.20it/s]
 38%|███▊      | 98/256 [00:06<00:09, 16.28it/s]
 39%|███▉      | 100/256 [00:06<00:09, 16.33it/s]
 40%|███▉      | 102/256 [00:06<00:09, 16.37it/s]
 41%|████      | 104/256 [00:06<00:09, 16.34it/s]
 41%|████▏     | 106/256 [00:06<00:09, 16.19it/s]
 42%|████▏     | 108/256 [00:06<00:09, 16.11it/s]
 43%|████▎     | 110/256 [00:06<00:09, 15.97it/s]
 44%|████▍     | 112/256 [00:06<00:08, 16.11it/s]
 45%|████▍     | 114/256 [00:06<00:08, 16.24it/s]
 45%|████▌     | 116/256 [00:07<00:08, 16.36it/s]
 46%|████▌     | 118/256 [00:07<00:08, 16.05it/s]
 47%|████▋     | 120/256 [00:07<00:08, 16.24it/s]
 48%|████▊     | 122/256 [00:07<00:08, 16.33it/s]
 48%|████▊     | 124/256 [00:07<00:08, 16.35it/s]
 49%|████▉     | 126/256 [00:07<00:08, 16.11it/s]
 50%|█████     | 128/256 [00:07<00:07, 16.27it/s]
 51%|█████     | 130/256 [00:07<00:07, 16.16it/s]
 52%|█████▏    | 132/256 [00:08<00:07, 16.31it/s]
 52%|█████▏    | 134/256 [00:08<00:07, 15.99it/s]
 53%|█████▎    | 136/256 [00:08<00:07, 15.92it/s]
 54%|█████▍    | 138/256 [00:08<00:07, 15.93it/s]
 55%|█████▍    | 140/256 [00:08<00:07, 15.90it/s]
 55%|█████▌    | 142/256 [00:08<00:07, 16.01it/s]
 56%|█████▋    | 144/256 [00:08<00:06, 16.15it/s]
 57%|█████▋    | 146/256 [00:08<00:06, 16.31it/s]
 58%|█████▊    | 148/256 [00:09<00:06, 16.41it/s]
 59%|█████▊    | 150/256 [00:09<00:06, 16.21it/s]
 59%|█████▉    | 152/256 [00:09<00:06, 16.37it/s]
 60%|██████    | 154/256 [00:09<00:06, 16.31it/s]
 61%|██████    | 156/256 [00:09<00:06, 16.28it/s]
 62%|██████▏   | 158/256 [00:09<00:05, 16.43it/s]
 62%|██████▎   | 160/256 [00:09<00:05, 16.49it/s]
 63%|██████▎   | 162/256 [00:09<00:05, 16.47it/s]
 64%|██████▍   | 164/256 [00:10<00:05, 16.47it/s]
 65%|██████▍   | 166/256 [00:10<00:05, 16.47it/s]
 66%|██████▌   | 168/256 [00:10<00:05, 16.52it/s]
 66%|██████▋   | 170/256 [00:10<00:05, 16.51it/s]
 67%|██████▋   | 172/256 [00:10<00:05, 16.42it/s]
 68%|██████▊   | 174/256 [00:10<00:04, 16.41it/s]
 69%|██████▉   | 176/256 [00:10<00:04, 16.37it/s]
 70%|██████▉   | 178/256 [00:10<00:04, 16.29it/s]
 70%|███████   | 180/256 [00:11<00:04, 16.32it/s]
 71%|███████   | 182/256 [00:11<00:04, 16.40it/s]
 72%|███████▏  | 184/256 [00:11<00:04, 16.44it/s]
 73%|███████▎  | 186/256 [00:11<00:04, 16.47it/s]
 73%|███████▎  | 188/256 [00:11<00:04, 16.33it/s]
 74%|███████▍  | 190/256 [00:11<00:04, 16.28it/s]
 75%|███████▌  | 192/256 [00:11<00:03, 16.10it/s]
 76%|███████▌  | 194/256 [00:11<00:03, 16.02it/s]
 77%|███████▋  | 196/256 [00:12<00:03, 15.53it/s]
 77%|███████▋  | 198/256 [00:12<00:03, 15.38it/s]
 78%|███████▊  | 200/256 [00:12<00:03, 15.58it/s]
 79%|███████▉  | 202/256 [00:12<00:03, 15.55it/s]
 80%|███████▉  | 204/256 [00:12<00:03, 15.34it/s]
 80%|████████  | 206/256 [00:12<00:03, 15.27it/s]
 81%|████████▏ | 208/256 [00:12<00:03, 15.40it/s]
 82%|████████▏ | 210/256 [00:12<00:02, 15.36it/s]
 83%|████████▎ | 212/256 [00:13<00:02, 15.38it/s]
 84%|████████▎ | 214/256 [00:13<00:02, 15.38it/s]
 84%|████████▍ | 216/256 [00:13<00:02, 15.35it/s]
 85%|████████▌ | 218/256 [00:13<00:02, 15.33it/s]
 86%|████████▌ | 220/256 [00:13<00:02, 15.20it/s]
 87%|████████▋ | 222/256 [00:13<00:02, 15.24it/s]
 88%|████████▊ | 224/256 [00:13<00:02, 15.41it/s]
 88%|████████▊ | 226/256 [00:14<00:01, 15.55it/s]
 89%|████████▉ | 228/256 [00:14<00:01, 15.72it/s]
 90%|████████▉ | 230/256 [00:14<00:01, 15.75it/s]
 91%|█████████ | 232/256 [00:14<00:01, 15.83it/s]
 91%|█████████▏| 234/256 [00:14<00:01, 15.89it/s]
 92%|█████████▏| 236/256 [00:14<00:01, 15.95it/s]
 93%|█████████▎| 238/256 [00:14<00:01, 15.99it/s]
 94%|█████████▍| 240/256 [00:14<00:00, 16.15it/s]
 95%|█████████▍| 242/256 [00:14<00:00, 16.28it/s]
 95%|█████████▌| 244/256 [00:15<00:00, 16.07it/s]
 96%|█████████▌| 246/256 [00:15<00:00, 16.20it/s]
 97%|█████████▋| 248/256 [00:15<00:00, 16.29it/s]
 98%|█████████▊| 250/256 [00:15<00:00, 16.15it/s]
 98%|█████████▊| 252/256 [00:15<00:00, 16.10it/s]
 99%|█████████▉| 254/256 [00:15<00:00, 16.00it/s]
100%|██████████| 256/256 [00:15<00:00, 15.79it/s]
100%|██████████| 256/256 [00:15<00:00, 16.13it/s]
torch.Size([16, 16, 16])
torch.Size([16, 3, 256, 256])
tensor([[[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.],
...,
[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.]],
[[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.],
...,
[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.]],
[[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.],
...,
[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0.,  ..., 0., 0., 0.]]])
torch.Size([3, 1124, 1064])
Version Details
Version ID
f74d40125d71ddd5885020201e638b6b270347bb606a6afbc61edc7b077bfb7b
Version Created
November 23, 2022
Run on Replicate →