bytonylee/honeybee 🖼️🔢📝✓ → 📝

▶️ 30 runs 📅 Jan 2024 ⚙️ Cog 0.9.4 🔗 GitHub 📄 Paper ⚖️ License

image-analysis image-to-text ocr text-generation visual-question-answering visual-understanding vqa

Performance

2.7sTypical run time

~543sCold start (first call)

30Total runs

About

Locality-enhanced Projector for Multimodal LLM

Example Output

Prompt:

"What is the title of this book?"

Output

The Little Book of Deep Learning

Performance Metrics

2.75s Prediction Time

542.73s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/KJcspdKRzoJNPWO6PsQcOTTNFjc2RmgCyPJdWen5pC12L7OM/demo-1.jpg",
  "top_k": 5,
  "prompt": "What is the title of this book?",
  "do_sample": true,
  "max_length": 200,
  "agree_to_research_only": true
}

Input Parameters

image (required) Type: string: Input image
top_k Type: integerDefault: 5: top k for sampling
prompt (required) Type: string: Input prompt
do_sample Type: booleanDefault: true: Whether you do sampling or not
max_length Type: integerDefault: 512: Maximum number of tokens to generate
agree_to_research_only Type: booleanDefault: true: You must agree to use this model only for research. It is not for commercial use.

Output Schema

Output

Type: string

Version Details

Version ID: 813e9d681d9936b2a184c2e3aefbb138688aa3100a708b46a05ad1be8b6fad0e
Version Created: January 30, 2024

Run on Replicate →