hayooucom/vision-model2 🔢📝🖼️ → 📝

▶️ 346 runs 📅 Feb 2025 ⚙️ Cog 0.13.7
image-analysis image-captioning text-generation vision-language visual-understanding

About

welcome to contact us. youkpan@gmail.com

Example Output

Prompt:

"Describe this."

Output

The image captures a moment of tranquility, featuring a black smartwatch resting on a wooden surface. The watch, which is the central focus of the image, is connected to a white charging cable. The cable, coiled and ready for use, extends from the watch, suggesting it's in the process of being charged.

The screen of the watch is alive with text, providing information about the "AI Friend" necklace. This suggests that the watch might be connected to a virtual assistant or a piece of wearable technology. The text on the screen reads: "The AI Friend necklace is a new wearable tech designed to act as a virtual friend. It's a virtual friend that you wear like a necklace."

The background of the image is blurred, drawing the viewer's attention to the watch and its charging cable. The wooden surface on which the watch rests adds a touch of warmth to the scene. The overall composition of the image suggests a blend of technology and simplicity.

Performance Metrics

8.78s Prediction Time
460.48s Total Time
All Input Parameters
{
  "seed": 123123,
  "top_k": 5,
  "top_p": 0.95,
  "prompt": "Describe this.",
  "image_url": [
    "https://v.stylee.top/files/test.jpg"
  ],
  "image_url0": "",
  "full_prompt": "",
  "temperature": 0.15,
  "image_base64": [],
  "system_prompt": "You are agent that can see, talk and act.",
  "max_new_tokens": 912,
  "repetition_penalty": 1.1
}
Input Parameters
seed Type: integer
The seed for the random number generator
top_k Type: integerDefault: 5
When decoding text, samples from the top k most likely tokens; lower to ignore less likely tokens.
top_p Type: numberDefault: 0.95Range: 0.01 - 1
When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens.
prompt Type: stringDefault: Describe this.
Text prompt to send to the model.
image_url Type: arrayDefault:
public image url.
image_url0 Type: stringDefault:
public image url.
full_prompt Type: stringDefault:
full prompt to send to the model. if use this ,set prompt to empty
temperature Type: numberDefault: 0.1Range: 0 - 5
Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic.
image_base64 Type: arrayDefault:
image data base64 if no image url, use this.
system_prompt Type: stringDefault: You are agent that can see, talk and act.
System prompt.
max_new_tokens Type: integerDefault: 200Range: 1 - 8000
Maximum number of tokens to generate. A word is generally 2-3 tokens.
output_id_as_str Type: integerDefault: 0
output id as string
repetition_penalty Type: numberDefault: 1.1Range: 0.01 - 10
Penalty for repeated words in generated text; 1 is no penalty, values greater than 1 discourage repetition, less than 1 encourage it.
Output Schema

Output

Type: arrayItems Type: string

Example Execution Logs
/root/.pyenv/versions/3.12.9/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:628: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.15` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
/root/.pyenv/versions/3.12.9/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:633: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.95` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(
/root/.pyenv/versions/3.12.9/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:745: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
return fn(*args, **kwargs)
/root/.pyenv/versions/3.12.9/lib/python3.12/site-packages/torch/utils/checkpoint.py:87: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
From v4.47 onwards, when a model cache is to be returned, `generate` will return a `Cache` instance instead by default (as opposed to the legacy tuple of tuples format). If you want to keep returning the legacy format, please set `return_legacy_cache=True`.
Version Details
Version ID
29e97c56ac1a0344a96dcbcf1d7f1e363279910aa61fdb91d9c333c9e223ab41
Version Created
February 28, 2025
Run on Replicate →