lucataco/paligemma-3b-pt-224 🖼️📝 → 📝

▶️ 1.5K runs 📅 May 2024 ⚙️ Cog 0.9.6 🔗 GitHub 📄 Paper ⚖️ License
image-to-text ocr visual-question-answering

About

PaliGemma 3B, an open VLM by Google, pre-trained with 224*224 input images and 128 token input/output text sequences

Example Output

Prompt:

"caption es"

Output

persona estacionada en una calle

Performance Metrics

0.58s Prediction Time
0.63s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/Kv6Dn1Mk1tZe7vfVaRuPNBJcoDBYhRGQ33OTkq70l375ULSi/car.jpg",
  "prompt": "caption es"
}
Input Parameters
image (required) Type: string
Grayscale input image
prompt Type: stringDefault: caption es
Input prompt
Output Schema

Output

Type: string

Version Details
Version ID
c519755cce71af83c3831c3b3b7fe6c1de4a4dc27eff91f9e79639e14924a078
Version Created
May 14, 2024
Run on Replicate →