Blog

Feature Announcement: Content Safety Detection is now GA!

Product Updates
Feature Announcement: Content Safety Detection

The internet is saturated with audio and video content. There are over 720,000 hours of new videos uploaded to YouTube alone every day! 

In many situations, this content may contain abusive language like hate speech, or excessive profanity, which could lead to a serious breach in brand safety, user trust, regulation, or even law.

To moderate audio and video content on the internet today, large teams of people are required to manually go through this content in order to flag anything that might be abusive or in violation of a platform’s policies.

For example, Facebook employs tens of thousands of people to manually review posts to their platform, and to flag those posts that include hate speech.

Today, AssemblyAI is excited to help solve this problem with a brand new feature, Content Safety Detection, which is now globally available (GA).

With Content-Safety Detection, transcriptions can be automatically classified with over 15 labels such as "hate speech," "profanity," "NSFW," and "pornography."

The full list of labels can be found in the API documentation. This feature is backed by the same state of the art deep learning research our team applies to our top rated Speech-to-Text API.

Any developer can now easily, and automatically, find sensitive content in audio/video files, without the need for any human in the loop.

At a glance

  • Transcribe and classify audio/video that includes Hate Speech, NSFW content, etc. with a single API call
  • See exactly where in the transcription text potentially unsafe content was found, along with the timestamp for where the flagged content occurred in the source audio or video file
  • Powered by the latest Deep Learning research, not traditional black-lists of words

Example use cases

The Content Safety Detection feature can be utilized in a variety of ways, for example:

  • Video and Podcasting platforms - adding content warnings to a show's description
  • Call Centers - flagging calls that contain hate speech or profanity
  • Policy Compliance - censoring content based on country-specific laws
  • Brand Safety - provide safer options for brands to advertise on a platform by flagging user generated content that might be deemed as "high risk" for a brand

How does Content Safety Detection work?

When the Content Safety Detection feature is enabled, the API will automatically classify your transcription text with one or more labels, such as "NSFW," "profanity," or "hate speech," and will include this information in the API's JSON response.

In the API response, specific sections of a transcription that are flagged by our Content Safety model will be shown. Also within this response is the exact timestamp the flagged text was spoken in the audio file, as well as the confidence score for the Content Safety label that was detected.

The JSON below is an example response from the API for a TED talk on global warming:

"content_safety_labels": {
    "results": [
        {
            "text": "...has led to a dramatic increase in fires and the disasters around the world have been increasing at an absolutely extraordinary. An unprecedented rate four times as many in the last 30 years...", 
            "labels": [
                {
                    "confidence": 0.9986903071403503, 
                    "label": "disasters"
                }
            ], 
            "timestamp": {
                "start": 171710, 
                "end": 200770
            }
        }
    ],
    "summary": {
        "disasters": 0.89,
    }
}

As you can see above, the API has flagged a section of the transcription as “disasters” (the label for Natural Disasters) where wildfires caused by global warming are talked about.

You'll also see that the API offers an overall “summary” score for the entire transcription text. This helps show how relevant each predicted label is in reference to the entire transcription text.

For example; let’s say a 45-minute audio file contains just one profanity. While the confidence of the “profanity” label for that word might be 99%, that word may be just one in 50,000 words. Therefore, the confidence score for “profanity” in the summary section of the JSON response would be quite low. The summary scores look at a combination of the frequency and confidence of each predicted label across the entire transcription.

Getting started with Content Safety Detection

Content Safety Detection is now available for all developers simply by adding a flag to your usual transcription request to the `/v2/transcript` API endpoint.

curl --request POST \
  --url https://api.assemblyai.com/v2/transcript \
  --header 'authorization: YOUR-API-TOKEN' \
  --header 'content-type: application/json' \
  --data '{"audio_url": "https://app.assemblyai.com/static/media/phone_demo_clip_1.wav", "content_safety": true}'

In return, the API will send back a JSON response structured similarly to the response below.

{
    # some keys have been hidden for readability
    ...
    "text": "Last year I showed these two slides that demonstrate that the Arctic ice cap, which for most of the last 3,000,000 years, has been the size of the lower 48 States, has shrunk by 40%...",    
    "id": "5551722-f677-48a6-9287-39c0aafd9ac1",
    "status": "completed",
    "content_safety_labels": {
        "status": "success", 
        "results": [
            {
                "text": "This increase in temperatures has led to a dramatic increase in fires, and the disasters around the world have been increasing at an absolutely extraordinary. An unprecedented rate. Four times as many in the last 30 years as the previous 75.", 
                "labels": [
                    {
                        "confidence": 0.9986903071403503, 
                        "label": "disasters"
                    }
                ], 
                "timestamp": {
                    "start": 171710, 
                    "end": 200770
                }
            }
        ],
        "summary": {
            "disasters": 0.89,
            ...
        }
    },
    ...
}

AssemblyAI is very excited to release Content Safety Detection as globally available! We would love to hear your thoughts and ideas on any of the awesome applications that you are building with the AssemblyAI API!

Subscribe to our blog!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

You may also like

Checkout some of our recent research and product updates

Getting started with HttpClientFactory in C# and .NET 5
Developer
Getting started with HttpClientFactory in C# and .NET 5

HttpClientFactory has been around the .NET ecosystem for a few years now. In this post we will look at 3 basic implementations of HttpClientFactory; basic, named, and typed.

Changelog: New Speaker Diarization model released
Changelog
Changelog: New Speaker Diarization model released

We have released a new Diarization model. Speaker diarization is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity.

Speedy, code-free speech-to-text with AssemblyAI and Postman
Developer
Speedy, code-free speech-to-text with AssemblyAI and Postman

Sometimes we just don’t have time to write code; we just want to see results! Using Postman and AssemblyAI, we can get a speech-to-text transcription, complete with awesome features without writing a single line of code!Postman is an awesome application used for API testing and development. It offers a free tier, which we will use in this blog.

ADVANCED TRANSCRIPTON FEATURES

Unlock your media with our advanced features like PII Redaction,
Keyword Boosts, Automatic Transcript Highlights, and more