hayooucom/vision-model 🔢📝🖼️ → 📝

▶️ 16.3K runs 📅 May 2024 ⚙️ Cog 0.9.8 🔗 GitHub 📄 Paper ⚖️ License

image-analysis image-captioning text-generation visual-understanding

About

This is phi-3-vision model , cost by time ,have fun~

Example Output

Prompt:

"please describe this image."

Output

The chart is a spreadsheet table with data organized into columns and rows. It includes headers labeled 'Product', 'Qtr 1', 'Qtr 2', and 'Grand Total'. There are eight products listed: Chocolade, Gummibarchen, Scottish Longbreads, Sir Rodney's Scones, Tarte au sucre, Chocolate Biscuits, and a total row at the bottom. Each product has corresponding values in the three quarters and grand total columns. For example, Chocolade has $744.60 in Qtr 1, $162.56 in Qtr 2, and a grand total of $907.16. The totals for each column are also provided at the end of the table, with Qtr 1 totaling $14,181.59, Qtr 2 totaling $8,127.78, and the Grand Total totaling $22,309.37. The table uses a green header for the first column and alternates shading between white and light grey for the rows to enhance readability.

Performance Metrics

12.32s Prediction Time

207.12s Total Time

All Input Parameters

{
  "top_k": 1,
  "top_p": 1,
  "prompt": "please describe this image.",
  "image_url": [
    "https://support.content.office.net/en-us/media/3dd2b79b-9160-403d-9967-af893d17b580.png"
  ],
  "max_tokens": 45000,
  "temperature": 0.1,
  "image_base64": [],
  "system_prompt": "You are a helpful AI assistant.",
  "max_new_tokens": 458,
  "repetition_penalty": 1.1
}

Input Parameters

seed Type: integer: The seed for the random number generator
top_k Type: integerDefault: 1: When decoding text, samples from the top k most likely tokens; lower to ignore less likely tokens.
top_p Type: numberDefault: 1Range: 0.01 - 1: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens.
prompt Type: stringDefault: hello ,who are you?: Text prompt to send to the model.
image_url Type: arrayDefault:: public image url.
max_tokens Type: integerDefault: 45000Range: 1 - 128000: Maximum number of tokens all. A word is generally 2-3 tokens.
temperature Type: numberDefault: 0.1Range: 0 - 5: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic.
image_base64 Type: arrayDefault:: image data base64 if no image url,use this.
system_prompt Type: stringDefault: You are a helpful AI assistant.: System prompt.
max_new_tokens Type: integerDefault: 200Range: 1 - 128000: Maximum number of tokens to generate. A word is generally 2-3 tokens.
repetition_penalty Type: numberDefault: 1.1Range: 0.01 - 10: Penalty for repeated words in generated text; 1 is no penalty, values greater than 1 discourage repetition, less than 1 encourage it.

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

max_tokens 45000
token_count0 6
stop_ids [1, 1]
/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:492: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.1` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:509: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `1` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
warnings.warn(

Version Details

Version ID: 6afc892d5aa00e0e0883dec30f7a766fcf515c64090def9d173093ac343c2438
Version Created: May 23, 2024

Run on Replicate →