How do Content Moderation severity scores work?
AssemblyAI’s content moderation API implements a context-aware severity scoring system that helps developers build more intelligent content filtering solutions. This guide examines how the severity parameter works through practical examples.
Severity Score Structure
The API returns severity scores across three levels for each detected label:
Please refer to our API Reference for Content Moderation for the full response.
The scores represent probability distributions across severity levels, always summing to 1.0.
Case Study: Contextual Analysis of Alcohol References:
To demonstrate how the system evaluates context, let’s analyze three different transcripts discussing alcohol:
Educational Context
Source: What Are Alcohols? | Organic Chemistry (full transcript + audio)
The system identifies the educational context, assigning maximum low severity despite explicit alcohol-related terminology.
Social Context
Source: How I Learned To Go To A Bar Alone And Meet Women (full transcript + audio)
References to bars and casual drinking split the probability between low and medium severity levels.
Substance Abuse Context
Source: Steve-O: I’m grateful my alcoholism was severe (full transcript + audio)
Discussion of addiction and abuse triggers high severity scores, reflecting potentially sensitive content.
Conclusion
The severity scoring system provides a sophisticated way to understand not just what topics are mentioned, but how they’re discussed. Each severity level (low, medium, high) represents a probability score between 0 and 1, allowing developers to implement precise content filtering based on their specific needs. For example, developers can set custom thresholds for content classification (e.g., flagging content with high severity > 0.7) or create tiered content ratings based on severity distributions. This granular approach to content moderation enables more accurate and context-aware content filtering systems.