deepseek-ai/janus-pro-1b 🔢🖼️📝 → 📝

▶️ 8.7K runs 📅 Feb 2025 ⚙️ Cog 0.13.7 🔗 GitHub 📄 Paper ⚖️ License

image-to-text multimodal ocr visual-question-answering visual-understanding vqa

Performance

2.5sTypical run time

~91sCold start (first call)

8.7KTotal runs

About

Janus-Pro is a novel autoregressive framework for multimodal understanding

Example Output

Output

Here is the formula in LaTeX code:

[
A_n = a_0 \left[ 1 + \frac{3}{4} \sum_{k=1}^{n} \left( \frac{4}{9} \right)^k \right]
]

Performance Metrics

2.55s Prediction Time

91.12s Total Time

All Input Parameters

{
  "seed": 42,
  "image": "https://replicate.delivery/pbxt/MUJhLC1lVS5HVeLXvmbOL1O2ESVNYCGNVoxqJumiRUn0Hl99/equation.png",
  "top_p": 0.95,
  "question": "Convert the formula into latex code.",
  "temperature": 0.1
}

Input Parameters

seed Type: integerDefault: 42: Random seed for reproducibility
image (required) Type: string: Input image for multimodal understanding
top_p Type: numberDefault: 0.95Range: 0 - 1: Top-p sampling value
question (required) Type: string: Question about the image
temperature Type: numberDefault: 0.1Range: 0 - 1: Temperature for text generation

Output Schema

Output

Type: string

Version Details

Version ID: eb4c5dffb46fb23d03a3b74e87ea36bfa830a0c3d5875ef2bfea310646ac8fd2
Version Created: February 12, 2025

Run on Replicate →