prunaai/gemma-4-26b-a4b-fast 🔢🖼️📝✓ → 📝

▶️ 14.1K runs 📅 Apr 2026 ⚙️ Cog 0.16.12 📄 Paper ⚖️ License
code-generation document-analysis image-to-text multilingual ocr text-generation video-to-text visual-understanding

About

This is a version of the MoE Gemma 4 26B optimised by Pruna AI.

Example Output

Output

vLLM is a high-throughput, memory-efficient serving engine for Large Language Models that uses a technique called PagedAttention to manage KV cache memory, significantly accelerating inference speeds and increasing serving capacity.

Performance Metrics

0.29s Prediction Time
0.30s Total Time
All Input Parameters
{
  "top_p": 0.95,
  "images": [],
  "message": "Explain vLLM in one sentence",
  "video_fps": 2,
  "max_tokens": 2048,
  "temperature": 0.6,
  "enable_thinking": false,
  "presence_penalty": 1,
  "max_visual_tokens": 280
}
Input Parameters
top_p Type: numberDefault: 0.95Range: 0 - 1
Nucleus sampling: only consider tokens with cumulative probability up to this value
video Type: string
Video file to send to the model
images Type: arrayDefault:
Images to send to the model
message Type: stringDefault: Explain vLLM in one sentence
The user message to send to the model
video_fps Type: numberDefault: 2Range: 0.1 - 30
Frames per second to sample from the video
max_tokens Type: integerDefault: 2048Range: 1 - 16384
Maximum number of tokens to generate
temperature Type: numberDefault: 0.6Range: 0 - 2
Sampling temperature (higher = more creative, lower = more deterministic)
system_prompt Type: string
Optional system prompt to set the model's behavior
enable_thinking Type: booleanDefault: false
Enable thinking mode (model reasons internally before answering)
presence_penalty Type: numberDefault: 1Range: 1 - 1.5
Presence penalty: penalize new tokens based on their presence in the text
max_visual_tokens Type: integerDefault: 280Range: 70 - 1120
Vision token budget per image (higher = more detail, more compute). Supported: 70, 140, 280, 560, 1120
Output Schema

Output

Type: arrayItems Type: string

Version Details
Version ID
007dac3717afbfb1ddade995c5fbff003a5a365660b8f8155b6bbb053eada6e4
Version Created
April 8, 2026
Run on Replicate →