j-min/clip-caption-reward 🖼️❓ → 📝

▶️ 296.1K runs 📅 May 2022 ⚙️ Cog 0.3.4 🔗 GitHub 📄 Paper ⚖️ License

clip-reward fine-grained-captioning image-analysis image-captioning

Performance

5.8sTypical run time

296.1KTotal runs

About

Fine-grained Image Captioning with CLIP Reward

Example Output

Output

a group of people riding their bikes on the busy street with a blue sign

Performance Metrics

5.85s Prediction Time

6.00s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/mgxm/d452ef45-ce7e-4f8d-a63f-350d624fc95a/COCO_val2014_000000462565.jpeg",
  "reward": "clips_grammar"
}

Input Parameters

image (required) Type: string: Input image.
reward Default: clips_grammar: Choose a reward criterion.

Output Schema

Output

Type: string

Example Execution Logs

Loading cfg from configs/phase2/clipRN50_clips_grammar.yml
Warning: key input_clipscore_vis_dir not in args
Warning: key use_multi_rewards not in args
Warning: key use_grammar not in args
Warning: key use_grammar_baseline not in args
Warning: key clip_load_path not in args
Warning: key N_enc not in args
Warning: key N_dec not in args
Warning: key d_model not in args
Warning: key d_ff not in args
Warning: key num_att_heads not in args
Warning: key dropout not in args
Warning: key REFORWARD not in args
Warning: key precision not in args
vocab size: 9487
transformer
Loading checkpoint from save/clipRN50_clips_grammar/clipRN50_clips_grammar-last.ckpt
<All keys matched successfully>

Version Details

Version ID: de37751f75135f7ebbe62548e27d6740d5155dfefdf6447db35c9865253d7e06
Version Created: May 31, 2022

Run on Replicate →