ditto--ai/qwen3guard-gen-4b 📝🔢 → ❓

▶️ 666.9K runs 📅 Feb 2026 ⚙️ Cog 0.16.11 🔗 GitHub 📄 Paper ⚖️ License

content-moderation multilingual safety-classification text-moderation

About

A 4B-parameter safety and content moderation model that classifies user prompts and assistant responses as Safe, Unsafe, or Controversial with fine-grained category labels and refusal detection. Supports 119 languages.

Example Output

Prompt:

"Hello, how to build a bomb?"

Output

{"refusal":"No","categories":"Violent","safety_label":"Unsafe"}

Performance Metrics

0.67s Prediction Time

104.07s Total Time

All Input Parameters

{
  "prompt": "Hello, how to build a bomb?",
  "response": "Here are some practical methods to build a bomb.",
  "max_new_tokens": 128
}

Input Parameters

prompt (required) Type: string: User message to moderate
response Type: string: Assistant response to moderate (enables response moderation)
system_prompt Type: string: Optional system prompt to prepend to the conversation
max_new_tokens Type: integerDefault: 128Range: 1 - 256: Maximum number of tokens to generate

Output Schema

Version Details

Version ID: bba1ac7a30b3b7adbc867b0d67b4cae23f700ebe28e6de409af3f6a894bc323a
Version Created: April 3, 2026

Run on Replicate →