google-deepmind/gemma-3-12b-it 🖼️🔢📝 → 📝

▶️ 13.9K runs 📅 Mar 2025 ⚙️ Cog 0.13.7 🔗 GitHub 📄 Paper ⚖️ License
image-analysis image-captioning image-to-text text-generation text-translation visual-understanding

About

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Example Output

Prompt:

"Describe this image in detail."

Output

Here's a detailed description of the image:

Overall Impression:

The image is a close-up photograph of a cluster of pink cosmos flowers, with a bumblebee actively pollinating one of the blooms. The shot is taken at eye level, giving a natural and intimate perspective.

Main Elements:

  • Cosmos Flowers: The focal point of the image is a group of pink cosmos flowers. They have broad, delicate petals in varying shades of pink, with a bright yellow center. The petals are slightly ruffled, giving them a soft and organic texture.
  • Bumblebee: A bumblebee is prominently featured, positioned on the central flower. It's actively feeding on the flower's pollen, its fuzzy body covered in yellow and black stripes.
  • Background: The background consists of a mix of green foliage and faded/dried cosmos flowers. This creates a slightly blurred backdrop, which helps to highlight the main subject. Some red flowers are also visible in the lower right corner.

Color Palette:

  • Dominant Colors: Pink (various shades), green, yellow, and black.
  • Additional Colors: Red (in the background flowers).

Lighting & Composition:

  • Lighting: The image appears to be taken in natural daylight, with soft and even lighting.
  • Composition: The composition is well-balanced, with the flowers and bee arranged in a pleasing manner. The depth of field is shallow, blurring the background and emphasizing the flowers and bee.

Overall Mood:

The image evokes a sense of natural beauty, pollination, and the vibrancy of a garden in bloom. It's a peaceful and visually appealing scene.

Performance Metrics

18.34s Prediction Time
140.75s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/MeBv1PWmcTf7voSh3U4fxefjKrtfNXaqmfX3UY4Iq6ZYDSlh/bee.jpg",
  "top_k": 50,
  "top_p": 0.9,
  "prompt": "Describe this image in detail.",
  "temperature": 0.7,
  "system_prompt": "You are a helpful assistant.",
  "max_new_tokens": 512
}
Input Parameters
image Type: string
Optional image input for multimodal tasks
top_k Type: integerDefault: 50Range: 0 - 100
Top-k sampling
top_p Type: numberDefault: 0.9Range: 0 - 1
Top-p sampling
prompt (required) Type: string
Text prompt for the model
temperature Type: numberDefault: 0.7Range: 0 - 2
Sampling temperature
system_prompt Type: stringDefault: You are a helpful assistant.
System prompt to guide the model's behavior
max_new_tokens Type: integerDefault: 512Range: 1 - 8192
Maximum number of tokens to generate
Output Schema

Output

Type: string

Version Details
Version ID
5a0df3fa58c87fbd925469a673fdb16f3dd08e6f4e2f1a010970f07b7067a81c
Version Created
March 12, 2025
Run on Replicate →