pharmapsychotic/clip-interrogator β“πŸ–ΌοΈ β†’ πŸ“

▢️ 4.5M runs πŸ“… Oct 2022 βš™οΈ Cog 0.8.6 πŸ”— GitHub βš–οΈ License
image-to-text prompt-generation

About

The CLIP Interrogator is a prompt engineering tool that combines OpenAI's CLIP and Salesforce's BLIP to optimize text prompts to match a given image. Use the resulting prompts with text-to-image models like Stable Diffusion to create cool art!

Example Output

Output

a watercolor painting of a sea turtle, a digital painting, by Kubisi art, featured on dribbble, medibang, warm saturated palette, red and green tones, turquoise horizon, digital art h 9 6 0, detailed scenery β€”width 672, illustration:.4, spray art, artstatiom

Performance Metrics

37.67s Prediction Time
215.89s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/HrXsgowfhbZi3dImGZoIcvnz7oZfMtFY4UAEU8vBIakTd8JQ/watercolour-4799014_960_720.jpg",
  "clip_model_name": "ViT-L/14"
}
Input Parameters
mode Default: best
Prompt mode (best takes 10-20 seconds, fast takes 1-2 seconds).
image (required) Type: string
Input image
clip_model_name Default: ViT-L-14/openai
Choose ViT-L for Stable Diffusion 1, ViT-H for Stable Diffusion 2, or ViT-bigG for Stable Diffusion XL.
Output Schema

Output

Type: string

Example Execution Logs
0%|          | 0/50 [00:00<?, ?it/s]
 24%|β–ˆβ–ˆβ–       | 12/50 [00:00<00:00, 119.71it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 24/50 [00:00<00:00, 117.53it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 36/50 [00:00<00:00, 117.99it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 48/50 [00:00<00:00, 118.17it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [00:00<00:00, 119.05it/s]
  0%|          | 0/6 [00:00<?, ?it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [00:00<00:00, 93.98it/s]
  0%|          | 0/1 [00:00<?, ?it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 33.81it/s]
Flavor chain:   0%|          | 0/25 [00:00<?, ?it/s]
Flavor chain:   4%|▍         | 1/25 [00:03<01:12,  3.01s/it]
Flavor chain:   8%|β–Š         | 2/25 [00:05<01:08,  2.97s/it]
Flavor chain:  12%|β–ˆβ–        | 3/25 [00:09<01:06,  3.02s/it]
Flavor chain:  16%|β–ˆβ–Œ        | 4/25 [00:12<01:03,  3.05s/it]
Flavor chain:  20%|β–ˆβ–ˆ        | 5/25 [00:15<01:00,  3.04s/it]
Flavor chain:  24%|β–ˆβ–ˆβ–       | 6/25 [00:18<00:57,  3.05s/it]
Flavor chain:  28%|β–ˆβ–ˆβ–Š       | 7/25 [00:21<00:55,  3.09s/it]
Flavor chain:  32%|β–ˆβ–ˆβ–ˆβ–      | 8/25 [00:24<00:55,  3.25s/it]
Flavor chain:  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 9/25 [00:28<00:53,  3.33s/it]
Flavor chain:  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 10/25 [00:32<00:51,  3.45s/it]
Flavor chain:  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 10/25 [00:33<00:49,  3.31s/it]
Version Details
Version ID
8151e1c9f47e696fa316146a2e35812ccf79cfc9eba05b11c7f450155102af70
Version Created
September 10, 2023
Run on Replicate β†’