samim23/internlm-xcomposer2 📝🖼️ → 📝

▶️ 95 runs 📅 Aug 2024 ⚙️ Cog 0.9.20 🔗 GitHub 📄 Paper ⚖️ License

image-captioning image-to-text visual-understanding

About

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Example Output

Output

The image captures the majestic Potala Palace in Lhasa, Tibet. The palace, a symbol of Tibetan culture and history, stands tall on a snowy mountain, its white walls contrasting sharply with the surrounding landscape. The palace is surrounded by a moat, adding to its grandeur. The sky above is overcast, casting a soft light over the scene. The palace is adorned with red and gold decorations, adding a touch of color to the otherwise monochromatic landscape. The people in the image are dressed in traditional Tibetan attire, their red robes standing out against the white backdrop. The environment is cold and snowy, with the snow covering the ground and the mountains in the background. The lighting is soft and diffused, creating a serene and peaceful atmosphere. The colors in the image are predominantly white, red, and gold, with the red and gold accents adding a touch of warmth to the otherwise cold and snowy scene.

Performance Metrics

10.05s Prediction Time

115.48s Total Time

All Input Parameters

{
  "text": "Caption this image. describe every single thing in the image in detail. Do not include any unnecessary words in your description for the sake of good grammar. I want many short statements that serve the single purpose of giving the most thorough description if items as possible in the smallest, comma separated way possible. Be sure to describe people's moods, clothing, the environment, lighting, colors, and everything.",
  "image": "https://replicate.delivery/pbxt/LVcv7tbU1l1rYK9xLxnuujJbgI1R15R8MhXp8wJu0sXtLdDq/dac2c2cdfc32e462a9d869ce1f00454c.jpg"
}

Input Parameters

text (required) Type: string: Input text.
image Type: string: Input image.

Output Schema

Output

Type: string

Example Execution Logs

/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/transformers/generation/utils.py:1417: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
warnings.warn(
/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(

Version Details

Version ID: b55d1a2cc9fd7faf0c2b9d8cb0892f7548f22ad9694a07fcae3929602251947f
Version Created: August 25, 2024

Run on Replicate →