jyoung105/imp 🖼️🔢📝 → 📝

▶️ 71 runs 📅 Jan 2024 ⚙️ Cog 0.9.4 🔗 GitHub ⚖️ License
image-analysis image-to-text ocr visual-question-answering

About

a family of multimodal small language models

Example Output

Prompt:

"What is the title of this book?"

Output

The Little Book of Deep Learning

Performance Metrics

2.96s Prediction Time
128.32s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/KJMIMNBJwHYQX1A4tfSmacSSxccDH3sVQtgzpwMuv88CbuJz/demo-1.jpg",
  "top_p": 0.95,
  "prompt": "What is the title of this book?",
  "temperature": 0.7,
  "max_new_tokens": 100
}
Input Parameters
image (required) Type: string
Input image
top_p Type: numberDefault: 0.95
Top p for sampling
prompt (required) Type: string
Input prompt
temperature Type: numberDefault: 0.7
Temperature for sampling
max_new_tokens Type: integerDefault: 100
Maximum number of tokens to generate
Output Schema

Output

Type: string

Version Details
Version ID
61cf72710422d9a6b8debff1a1a8fd7dc683fe31d25a837fe7df5e6b0a5f4a54
Version Created
January 29, 2024
Run on Replicate →