lucataco/paligemma-3b-pt-224 🖼️📝 → 📝

▶️ 4.1K runs 📅 May 2024 ⚙️ Cog 0.9.6 🔗 GitHub 📄 Paper ⚖️ License

Performance

0.6sTypical run time

4.1KTotal runs

PaliGemma 3B, an open VLM by Google, pre-trained with 224*224 input images and 128 token input/output text sequences

Prompt:

"caption es"

persona estacionada en una calle

0.58s Prediction Time

0.63s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/Kv6Dn1Mk1tZe7vfVaRuPNBJcoDBYhRGQ33OTkq70l375ULSi/car.jpg",
  "prompt": "caption es"
}

Input Parameters

Output Schema

Output

Type: string

Version Details

Version ID: c519755cce71af83c3831c3b3b7fe6c1de4a4dc27eff91f9e79639e14924a078
Version Created: May 14, 2024