jyoung105/honeybee 🖼️🔢📝✓ → 📝

▶️ 29 runs 📅 Jan 2024 ⚙️ Cog 0.9.4 🔗 GitHub 📄 Paper ⚖️ License
image-to-text visual-question-answering

About

Locality-enhanced Projector for Multimodal LLM

Example Output

Prompt:

"What is the title of this book?"

Output

The Little Book of Deep Learning

Performance Metrics

2.75s Prediction Time
542.73s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/KJcspdKRzoJNPWO6PsQcOTTNFjc2RmgCyPJdWen5pC12L7OM/demo-1.jpg",
  "top_k": 5,
  "prompt": "What is the title of this book?",
  "do_sample": true,
  "max_length": 200,
  "agree_to_research_only": true
}
Input Parameters
image (required) Type: string
Input image
top_k Type: integerDefault: 5
top k for sampling
prompt (required) Type: string
Input prompt
do_sample Type: booleanDefault: true
Whether you do sampling or not
max_length Type: integerDefault: 512
Maximum number of tokens to generate
agree_to_research_only Type: booleanDefault: true
You must agree to use this model only for research. It is not for commercial use.
Output Schema

Output

Type: string

Version Details
Version ID
813e9d681d9936b2a184c2e3aefbb138688aa3100a708b46a05ad1be8b6fad0e
Version Created
January 30, 2024
Run on Replicate →