bytonylee/imp 🖼️🔢📝 → 📝

▶️ 79 runs 📅 Jan 2024 ⚙️ Cog 0.9.4 🔗 GitHub ⚖️ License

image-analysis image-to-text ocr visual-question-answering

Performance

3.0sTypical run time

~128sCold start (first call)

79Total runs

About

a family of multimodal small language models

Example Output

Prompt:

"What is the title of this book?"

Output

The Little Book of Deep Learning

Performance Metrics

2.96s Prediction Time

128.32s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/KJMIMNBJwHYQX1A4tfSmacSSxccDH3sVQtgzpwMuv88CbuJz/demo-1.jpg",
  "top_p": 0.95,
  "prompt": "What is the title of this book?",
  "temperature": 0.7,
  "max_new_tokens": 100
}

Input Parameters

image (required) Type: string: Input image
top_p Type: numberDefault: 0.95: Top p for sampling
prompt (required) Type: string: Input prompt
temperature Type: numberDefault: 0.7: Temperature for sampling
max_new_tokens Type: integerDefault: 100: Maximum number of tokens to generate

Output Schema

Output

Type: string

Version Details

Version ID: 61cf72710422d9a6b8debff1a1a8fd7dc683fe31d25a837fe7df5e6b0a5f4a54
Version Created: January 29, 2024

Run on Replicate →