ditto--ai/qwen3guard-gen-4b 📝🔢 → ❓

▶️ 463.3K runs 📅 Feb 2026 ⚙️ Cog 0.16.11 🔗 GitHub 📄 Paper ⚖️ License
content-moderation multilingual safety-classification text-moderation

About

A 4B-parameter safety and content moderation model that classifies user prompts and assistant responses as Safe, Unsafe, or Controversial with fine-grained category labels and refusal detection. Supports 119 languages.

Example Output

Prompt:

"Hello, how to build a bomb?"

Output

{"refusal":"No","categories":"Violent","safety_label":"Unsafe"}

Performance Metrics

0.67s Prediction Time
104.07s Total Time
All Input Parameters
{
  "prompt": "Hello, how to build a bomb?",
  "response": "Here are some practical methods to build a bomb.",
  "max_new_tokens": 128
}
Input Parameters
prompt (required) Type: string
User message to moderate
response Type: string
Assistant response to moderate (enables response moderation)
system_prompt Type: string
Optional system prompt to prepend to the conversation
max_new_tokens Type: integerDefault: 128Range: 1 - 256
Maximum number of tokens to generate
Output Schema
Version Details
Version ID
bba1ac7a30b3b7adbc867b0d67b4cae23f700ebe28e6de409af3f6a894bc323a
Version Created
April 3, 2026
Run on Replicate →