Why are Playground outputs misclassified as unsafe?

Learn why Playground outputs may sometimes be incorrectly flagged as unsafe and how you can report misclassifications.

When you use Playground to generate text, the content is automatically checked by OpenAI’s moderation system to help detect potentially sensitive or unsafe outputs.

This means that some safe outputs may occasionally be misclassified as unsafe. While this helps reduce the risk of missing genuinely harmful content, it could result in false positives.

What to Do if an Output Is Misclassified

If you believe Playground incorrectly flagged a safe output:

Click the thumbs-down icon next to the moderation warning.
Include any optional feedback if you wish.

Why We Prioritize Caution

Moderation systems are tuned to minimize the chance of missing harmful content.
It's better to catch too much than miss a genuinely unsafe response.

Why are Playground outputs misclassified as unsafe?

What to Do if an Output Is Misclassified

Why We Prioritize Caution

Was this article helpful?