samim23/internlm-xcomposer2 📝🖼️ → 📝
About
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Example Output
Output
The image captures the majestic Potala Palace in Lhasa, Tibet. The palace, a symbol of Tibetan culture and history, stands tall on a snowy mountain, its white walls contrasting sharply with the surrounding landscape. The palace is surrounded by a moat, adding to its grandeur. The sky above is overcast, casting a soft light over the scene. The palace is adorned with red and gold decorations, adding a touch of color to the otherwise monochromatic landscape. The people in the image are dressed in traditional Tibetan attire, their red robes standing out against the white backdrop. The environment is cold and snowy, with the snow covering the ground and the mountains in the background. The lighting is soft and diffused, creating a serene and peaceful atmosphere. The colors in the image are predominantly white, red, and gold, with the red and gold accents adding a touch of warmth to the otherwise cold and snowy scene.
Performance Metrics
10.05s
Prediction Time
115.48s
Total Time
All Input Parameters
{ "text": "Caption this image. describe every single thing in the image in detail. Do not include any unnecessary words in your description for the sake of good grammar. I want many short statements that serve the single purpose of giving the most thorough description if items as possible in the smallest, comma separated way possible. Be sure to describe people's moods, clothing, the environment, lighting, colors, and everything.", "image": "https://replicate.delivery/pbxt/LVcv7tbU1l1rYK9xLxnuujJbgI1R15R8MhXp8wJu0sXtLdDq/dac2c2cdfc32e462a9d869ce1f00454c.jpg" }
Input Parameters
- text (required)
- Input text.
- image
- Input image.
Output Schema
Output
Example Execution Logs
/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/transformers/generation/utils.py:1417: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation ) warnings.warn( /root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. warnings.warn(
Version Details
- Version ID
b55d1a2cc9fd7faf0c2b9d8cb0892f7548f22ad9694a07fcae3929602251947f
- Version Created
- August 25, 2024