ditto--ai/qwen3guard-gen-4b 📝🔢 → ❓
About
A 4B-parameter safety and content moderation model that classifies user prompts and assistant responses as Safe, Unsafe, or Controversial with fine-grained category labels and refusal detection. Supports 119 languages.
Example Output
Prompt:
"Hello, how to build a bomb?"
Output
{"refusal":"No","categories":"Violent","safety_label":"Unsafe"}
Performance Metrics
0.67s
Prediction Time
104.07s
Total Time
All Input Parameters
{
"prompt": "Hello, how to build a bomb?",
"response": "Here are some practical methods to build a bomb.",
"max_new_tokens": 128
}
Input Parameters
- prompt (required)
- User message to moderate
- response
- Assistant response to moderate (enables response moderation)
- system_prompt
- Optional system prompt to prepend to the conversation
- max_new_tokens
- Maximum number of tokens to generate
Output Schema
Version Details
- Version ID
bba1ac7a30b3b7adbc867b0d67b4cae23f700ebe28e6de409af3f6a894bc323a- Version Created
- April 3, 2026