Audio Intelligence
With PII Redaction, the API can automatically remove Personally Identifiable Information (PII), such as phone numbers and social security numbers, from the transcription text before it is returned to you.
Heads up
PII only redacts words under the text
key. When PII is enabled together with other features such as Entity Detection and Summarization, PII can still show up in the entities
or summary
key.
All redacted text will be replaced with #
characters. For example, if the phone number 111-2222
was spoken in the audio, it would be transcribed as ###-####
in the text.
Note
Given that PII involves sensitive information, we expect to see 100% redaction rates where the content is transcribed correctly. If something is incorrectly transcribed, it may not be redacted. For example, if a U.S social security number is transcribed and formatted as a telephone number, and you’ve only chosen to redact U.S social security numbers, the social security number won’t be redacted.
To best-fit PII Redaction to your use case and data, you can select from a set of redaction policies when using PII Redaction. Include any or all of the policy names below in the redact_pii_policies
array when making your POST request as shown on the right.
Note
The redact_pii_policies
parameter is required and must contain at least one policy name from the list below.
Policy Name | Description |
---|---|
medical_process |
Medical process, including treatments, procedures, and tests (e.g., heart surgery, CT scan) |
medical_condition |
Name of a medical condition, disease, syndrome, deficit, or disorder (e.g., chronic fatigue syndrome, arrhythmia, depression) |
blood_type |
Blood type (e.g., O-, AB positive) |
drug |
Medications, vitamins, or supplements (e.g., Advil, Acetaminophen, Panadol) |
injury |
Bodily injury (e.g., I broke my arm, I have a sprained wrist) |
number_sequence |
A "lazy" rule that will redact any sequence of numbers equal to or greater than 2 |
email_address |
Email address (e.g., support@assemblyai.com) |
date_of_birth |
Date of Birth (e.g., Date of Birth: March 7,1961) |
phone_number |
Telephone or fax number |
us_social_security_number |
Social Security Number or equivalent |
credit_card_number |
Credit card number |
credit_card_expiration |
Expiration date of a credit card |
credit_card_cvv |
Credit card verification code (e.g., CVV: 080) |
date |
Specific calendar date (e.g., December 18) |
nationality |
Terms indicating nationality, ethnicity, or race (e.g., American, Asian, Caucasian) |
event |
Name of an event or holiday (e.g., Olympics, Yom Kippur) |
language |
Name of a natural language (e.g., Spanish, French) |
location |
Any Location reference including mailing address, postal code, city, state, province, or country |
money_amount |
Name and/or amount of currency (e.g., 15 pesos, $94.50) |
person_name |
Name of a person (e.g., Bob, Doug Jones) |
person_age |
Number associated with an age (e.g., 27, 75) |
organization |
Name of an organization (e.g., CNN, McDonalds, University of Alaska) |
political_affiliation |
Terms referring to a political party, movement, or ideology (e.g., Republican, Liberal) |
occupation |
Job title or profession (e.g., professor, actors, engineer, CPA) |
religion |
Terms indicating religious affiliation (e.g., Hindu, Catholic) |
drivers_license |
Driver’s license number (e.g., DL# 356933-540) |
banking_information |
Banking information, including account and routing numbers |
By default, any PII that is spoken will be transcribed with a hash #
. For example, the credit card number 1111-2222-3333-4444
will be transcribed as ####-####-####-####
.
By including the redact_pii_sub
parameter in your POST request, you can customize how the PII is replaced.
Here are the options for the redact_pii_sub
parameter:
Value | Description |
---|---|
hash |
PII that is detected is replaced with a hash - # . For example, I'm calling for John is replaced with #### . (Applied by default) |
entity_name |
PII that is detected is replaced with the associated policy name. For example, John is replaced with [PERSON_NAME] . This is recommended for readability. |
When you transcribe a file with PII Redaction, the API can optionally generate a version of your original audio file with the PII "beeped" out when it is being spoken. To do so, include the redact_pii_audio
parameter in your POST
request, as shown on the right, when submitting files for transcription.
You can make a GET
request to the below API endpoint, also shown on the right, to retrieve a URL that points to your redacted audio file:
https://api.assemblyai.com/v2/transcript/<your transcript id>/redacted-audio
In the JSON response shown on the right, you'll see the API responds with a redacted_audio_url
key. This key contains a URL that points to your redacted audio file.
Please note that the redacted_audio_url
link is only accessible for 24 hours. You'll want to download the redacted audio file from the URL, and save a copy on your end before the link expires.
Alternative Status Codes and Responses
Sometimes, you may not receive a 200
status code from the below API endpoint. In the table below, we outline the non-200 status codes you may receive, and what they mean.
https://api.assemblyai.com/v2/transcript/<your transcript id>/redacted-audio
Response Code | Description |
---|---|
202 | A 202 status code will be returned if audio redaction is still in progress. Depending on the length of the file it can take several minutes after the audio file finishes transcribing for the redacted audio file to be created. |
400 | A 400 will be returned if something is wrong with your request or if the redacted audio file no longer exists on our servers. |
Receive a Webhook
If a webhook_url
was provided in your POST
request when submitting your audio file for transcription, we will send a POST
to your webhook_url
when the redacted audio is ready. The POST
request headers and JSON body will look like this:
headers
---
content-length: 79
accept-encoding: gzip, deflate
accept: */*
user-agent: python-requests/2.21.0
content-type: application/json
params
--
status: 'redacted_audio_ready'
redacted_audio_url: 'https://link-to-redacted-audio'
With Automatic Transcript Highlights, the AssemblyAI API can automatically detect important phrases and words in your transcription text.
For example, consider the following text:
We smirk because we believe that synthetic happiness is not of the same quality as what we might call natural happiness. What are these terms? Natural happiness is what we get when we get what we wanted. And synthetic happiness is what we make when we don't get what we wanted. And in our society...
Automatic Transcript Highlights will automatically detect the following key phrases/words in the text:
"synthetic happiness"
"natural happiness"
...
To enable this feature, include the auto_highlights
parameter in your POST
request when submitting files for transcription, and set this parameter to true
.
Heads up
Your files can take up to 60 seconds longer to process when Automatic Transcript Highlights is enabled.
Once your transcription is complete, and you GET
the result, you'll see an auto_highlights_result
key in the JSON response.
The auto_highlights_result
key in the JSON response will contain the key phrases/words the API found in your transcription text. Here is a close-up of only that key's response, and what each value means:
Response Key | Description |
---|---|
status |
Will be either "success" , or "unavailable" in the rare case that the Automatic Transcript Highlights model failed |
results |
A list of all the highlights found in your transcription text |
results.text |
The phrase/word itself that was detected |
results.count |
How many times this phrase occurred in the text |
results.rank |
The relevancy of this phrase - the higher the score, the better |
results.timestamps |
a list of all the timestamps, in milliseconds, in the audio where each phrase/word is spoken |
With Content Safety Detection, AssemblyAI can detect if any of the following sensitive content is spoken in your audio/video files, and pinpoint exactly when and what was spoken:
Label | Description | Model Output | Supported by Severity Scores |
---|---|---|---|
Accidents | Any man-made incident that happens unexpectedly and results in damage, injury, or death. | accidents |
Yes |
Alcohol | Content that discusses any alcoholic beverage or its consumption. | alcohol |
Yes |
Company Financials | Content that discusses any sensitive company financial information. | financials |
No |
Crime Violence | Content that discusses any type of criminal activity or extreme violence that is criminal in nature. | crime_violence |
Yes |
Drugs | Content that discusses illegal drugs or their usage. | drugs |
Yes |
Gambling | Includes gambling on casino-based games such as poker, slots, etc. as well as sports betting. | gambling |
Yes |
Hate Speech | Content that is a direct attack against people or groups based on their sexual orientation, gender identity, race, religion, ethnicity, national origin, disability, etc. | hate_speech |
Yes |
Health Issues | Content that discusses any medical or health-related problems. | health_issues |
Yes |
Manga | Mangas are comics or graphic novels originating from Japan with some of the more popular series being "Pokemon", "Naruto", "Dragon Ball Z", "One Punch Man", and "Sailor Moon". | manga |
No |
Marijuana | This category includes content that discusses marijuana or its usage. | marijuana |
Yes |
Natural Disasters | Phenomena that happens infrequently and results in damage, injury, or death. Such as hurricanes, tornadoes, earthquakes, volcano eruptions, and firestorms. | disasters |
Yes |
Negative News | News content with a negative sentiment which typically will occur in the third person as an unbiased recapping of events. | negative_news |
No |
NSFW (Adult Content) | Content considered "Not Safe for Work" and consists of content that a viewer would not want to be heard/seen in a public environment. | nsfw |
No |
Pornography | Content that discusses any sexual content or material. | pornography |
Yes |
Profanity | Any profanity or cursing. | profanity |
Yes |
Sensitive Social Issues | This category includes content that may be considered insensitive, irresponsible, or harmful to certain groups based on their beliefs, political affiliation, sexual orientation, or gender identity. | sensitive_social_issues |
No |
Terrorism | Includes terrorist acts as well as terrorist groups. Examples include bombings, mass shootings, and ISIS. Note that many texts corresponding to this topic may also be classified into the crime violence topic. | terrorism |
Yes |
Tobacco | Text that discusses tobacco and tobacco usage, including e-cigarettes, nicotine, vaping, and general discussions about smoking. | tobacco |
Yes |
Weapons | Text that discusses any type of weapon including guns, ammunition, shooting, knives, missiles, torpedoes, etc. | weapons |
Yes |
Include the content_safety
parameter in your POST
request when submitting audio files for transcription, and set this parameter to true
.
Once the transcription is complete, and you get the result, there will be an additional key content_safety_labels
in the JSON response. Below, we'll drill into the data that is returned in the content_safety_labels
key.
Response Key | Description |
---|---|
status |
Will be either "success" , or "unavailable" in the rare case that the Content Safety Detection model failed |
results |
A list of all the spoken audio the Content Safety Detection model flagged |
results.text |
The text transcription of what was spoken that triggered the Content Safety Detection Model |
results.labels |
A list of labels the Content Safety Detection model predicted for the flagged content, as well as the confidence and severity of each label. The confidence score is a range between 0 and 1 , and is how confident the model was in the label it predicted. The severity score is also a range 0 and 1 , and indicates how severe the flagged content is, with 1 being most severe. |
results.timestamp |
The start and end time, in milliseconds, for where the flagged content was spoken in the audio |
summary |
The summary key provides the confidence of the most common labels in relation to the entire audio file. |
severity_score_summary |
The severity_score_summary key provides the overall severity of the most common labels in relation to the entire audio file. |
Each label will be returned with a confidence
score and a severity
score. It is important to note that these two keys measure two very different things. The severity
key will produce a score that shows how severe the flagged content is on a scale of 0–1
. For example, a natural disaster with mass casualties would be a 1
, whereas a wind storm that broke a lamppost would be a 0.1
.
In comparison, confidence
displays how confident the model was in predicting the label it predicted, also on a scale of 0-1
.
We can break this down further by reviewing the following label:
"labels": [
{
"label": "health_issues",
"confidence": 0.8225132822990417,
"severity": 0.15090347826480865
}
],
In the above example, the Content Safety model is indicating it is 82.25%
confident that the spoken content is about Health Issues; however, it is measured at a low severity of 0.1509
. This means the model is very confident the content is about health issues, but the content was not severe in nature (ie, was likely about a minor health issue).
The severity_score_summary
key lists each label that was detected along with low
, medium
, and high
keys.
"severity_score_summary": {
"health_issues": {
"low": 0.7210625030587972,
"medium": 0.2789374969412028,
"high": 0.0
}
}
The value of the low
, medium
, and high
keys reflect the API's confidence that the label is "low," "medium," or "high" in severity throughout the entire audio file. This score is based on the intersection of the length of the audio file, the frequency of low
/medium
/high
severity tags through the file, and the confidence
score for each of those occurrences.
By default, the content safety model will return any label with a confidence of 50% or greater. If you wish to set a higher or lower threshold, add the content_safety_confidence: {number}
parameter to your POST
request. This parameter will accept an integer value between 25
and 100
, inclusive.
With Topic Detection, AssemblyAI can label the topics that are spoken in your audio/video files. The predicted topic labels follow the standardized IAB Taxonomy, which makes them suitable for Contextual Targeting use cases. The below table shows the 698 potential topics the API can predict.
To use our Topic Detection feature, include the iab_categories
parameter in your POST
request when submitting audio files for transcription, and set this parameter to true
, as shown in the cURL request on the right.
Once the transcription is complete, and you get the transcription result, there will be an additional key iab_categories_result
in the JSON response. Below, we drill into that key and what data it includes.
Response Key | Description |
---|---|
status |
Will be either "success" , or "unavailable" in the rare case that the Topic Detection model failed |
results |
The list of topics that were predicted for the audio file, including the text that influenced each topic label prediction, and other metadata about relevancy and timestamps |
results.text |
The transcription text for the portion of audio that was classified with topic labels |
results.timestamp |
The start and end time, in milliseconds, for where the portion of text in results.text was spoken in the audio file |
results.labels |
The list of labels that were predicted for this portion of text. The relevance key gives a score between 0 and 1.0 for how relevant each label is for the portion of text. |
summary |
The twenty topic labels from the results array with the highest relevancy score across the entire audio file. For example, if the Science>Environment label is detected only 1 time in a 60 minute audio file, the summary key will show a low relevancy score for that label, since the entire transcription was not found to consistently be about Science>Environment . |
With Sentiment Analysis, AssemblyAI can detect the sentiment of each sentence of speech spoken in your audio files. Sentiment Analysis returns a result of POSITIVE
, NEGATIVE
, or NEUTRAL
for each sentence in the transcript.
To include Sentiment Analysis in your transcript results, add the sentiment_analysis
parameter in your POST
request when submitting files for transcription and set this parameter to true
, as shown in the cURL request on the right.
Once the transcription is complete, and you get the result, there will be an additional key sentiment_analysis_results
in the JSON response.
For each sentence in the transcription text, the API will return the sentiment, confidence score, the start and end time for when that sentence was spoken, and, if applicable, the speaker label for that sentence. A detailed explanation of each key in the list of objects returned in the sentiment_analysis_results
array can be found in the below table.
Key | Value |
---|---|
text |
The transcription text of the sentence being analyzed |
start |
Starting timestamp (in milliseconds) of the text in the transcript |
end |
Ending timestamp (in milliseconds) of the text in the transcript |
sentiment |
The detected sentiment POSITIVE , NEGATIVE , or NEUTRAL |
confidence |
Confidence score for the detected sentiment |
speaker |
If using dual_channel or speaker_labels , then the speaker that spoke this sentence |
With Summarization, AssemblyAI can generate a single abstractive summary of entire audio files submitted for transcription.
When submitting a file for transcription, include the summarization
parameter in your POST
request, and set this to true
. By default, Summarization produces a bullet-point style summary using our Informative
model. Optionally, you can customize the summararization to best fit your use case using the summary_model
and summary_type
parameters.
The summary_model
parameters specifies which Summarization model will be used. The summary_type
parameter determines the type of summarization results you will receive.
The table below shows more information on the options available for the summary_model
parameter:
Summary Model | Recommended Use Cases | Supported Summary Types | Required Parameters |
---|---|---|---|
informative (default) |
Best for files with a single speaker such as presentations or lectures | bullets , bullets_verbose , headline , or paragraph |
punctuate and format_text set to true |
conversational |
Best for any 2 person conversation such as customer/agent or interview/interviewee calls | bullets , bullets_verbose , headline , or paragraph |
punctuate , format_text , and speaker_labels or dual_channel set to true |
catchy |
Best for creating video, podcast, or media titles | headline or gist |
punctuate and format_text set to true |
The table below shows more information on the options available for the summary_type
parameter:
Summary Type | Description | Example |
---|---|---|
bullets (default) |
A bulleted summary with the most important points. | - The human brain has nearly tripled in mass in two million years.\n- One of the main reasons that our brain got so big is because it got a new part, called the frontal lobe.\n |
bullets_verbose |
A longer bullet point list summarizing the entire transcription text. | - Dan Gilbert is a psychologist and a happiness expert. His talk is recorded live at Ted conference. He explains why the human brain has nearly tripled in size in 2 million years. He also explains the difference between winning the lottery and becoming a paraplegic.\n- In 1994, Pete Best said he's happier than he would have been with the Beatles. In the free choice paradigm, monet prints are ranked from the one they like the most to the one that they don't. People prefer the third one over the fourth one because it's a little better.\n- People synthesize happiness when they change their affective. Hedonic aesthetic reactions to a poster. The ability to make up your mind and change your mind is the friend of natural happiness. But it's the enemy of synthetic happiness. The psychological immune system works best when we are stuck. This is the difference between dating and marriage. People don't know this about themselves and it can work to their disadvantage.\n- In a photography course at Harvard, 66% of students choose not to take the course where they have the opportunity to change their mind. Adam Smith said that some things are better than others. Dan Gilbert recorded at Ted, 2004 in Monterey, California, 2004. |
gist |
A few words summarizing the entire transcription text. | A big brain |
headline |
A single sentence summarizing the entire transcription text. | The human brain has nearly tripled in mass in two million years. |
paragraph |
A single paragraph summarizing the entire transcription text. | The human brain has nearly tripled in mass in two million years. It went from the one-and-a-quarter-pound brain of our ancestor, habilis, to the almost three-pound meatloaf everybody here has between their ears. |
Note
You cannot have the Auto Chapters model and the Summarization model active in the same request. If you enable both Auto Chapters and Summarization in a single request, you will receive the following error message "error": "Only one of the following models can be enabled at a time: auto_chapters, summarization."
When the transcription finishes, you will see the generated summary
key in the JSON response containing the summary of their submitted audio/video file in the format (summary_type
) you requested.
Learn more
For more information on how to use our Summarization models, check out the announcement on our blog.
Auto Chapters provides a "summary over time" for audio files transcribed with AssemblyAI. It works by first segmenting your audio files into logical "chapters" as the topic of conversation changes, and then provides an automatically generated summary for each "chapter" of content. For more information on Auto Chapters, check out this announcement from our blog.
When submitting a file for transcription, include the auto_chapters
parameter in your POST
request, and set this to true
.
Note
Punctuation must be enabled in your request to use the Auto Chapters feature. If you attempt to use Auto Chapters with punctuate
set to False
you will receive the following error: "error": "punctuate must be enabled when auto_chapters is enabled."
Additionally, you cannot have the Auto Chapters model and the Summarization model active in the same request. If you enable both Auto Chapters and Summarization in a single request, you will receive the following error message "error": "Only one of the following models can be enabled at a time: auto_chapters, summarization."
When your transcription is completed, you'll see a chapters
key in the JSON response, as shown on the right. For each chapter that was detected, the API will include the start
and end
timestamps (in milliseconds), a summary
- which is a few sentence summary of the content spoken during that timeframe, a short headline
, which can be thought of as a "summary of the summary", and a gist
, which is an ultra-short, few word summary of the chapter of content.
Key | Value |
---|---|
start |
Starting timestamp (in milliseconds) of the portion of audio being summarized |
end |
Ending timestamp (in milliseconds) of the portion of audio being summarized |
summary |
A one paragraph summary of the content spoken during this timeframe |
headline |
A single sentence summary of the content spoken during this timeframe |
gist |
An ultra-short summary, just a few words, of the content spoken during this timeframe |
With Entity Detection, you can identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.
To include Entity Detection in your transcript response, add the entity_detection
parameter in your POST
request when submitting audio files for transcription, and set this parameter to true
.
When your transcription is complete, you will see an entities
key in the JSON response, as shown on the right. Below, we drill into the data that is returned within the list of results in the entities
key.
Key | Value |
---|---|
entity_type |
The entity type detected |
text |
The text containing the entity |
start |
Starting timestamp, in milliseconds, of the entity in the transcript |
end |
Ending timestamp, in milliseconds, of the entity in the transcript |
When Entity Detection is enabled, the entity types listed below are automatically detected and, if found in the transcription text, will be included in the entities
key as shown above. They will be listed individually in the order that they appear in the transcript.
Entity Name | Description |
---|---|
blood_type |
Blood type (e.g., O-, AB positive) |
credit_card_cvv |
Credit card verification code (e.g., CVV: 080) |
credit_card_expiration |
Expiration date of a credit card |
credit_card_number |
Credit card number |
date |
Specific calendar date (e.g., December 18) |
date_of_birth |
Date of Birth (e.g., Date of Birth: March 7, 1961) |
drug |
Medications, vitamins, or supplements (e.g., Advil, Acetaminophen, Panadol) |
event |
Name of an event or holiday (e.g., Olympics, Yom Kippur) |
email_address |
Email address (e.g., support@assemblyai.com) |
injury |
Bodily injury (e.g., I broke my arm, I have a sprained wrist) |
language |
Name of a natural language (e.g., Spanish, French) |
location |
Any Location reference including mailing address, postal code, city, state, province, or country |
medical_condition |
Name of a medical condition, disease, syndrome, deficit, or disorder (e.g., chronic fatigue syndrome, arrhythmia, depression) |
medical_process |
Medical process, including treatments, procedures, and tests (e.g., heart surgery, CT scan) |
money_amount |
Name and/or amount of currency (e.g., 15 pesos, $94.50) |
nationality |
Terms indicating nationality, ethnicity, or race (e.g., American, Asian, Caucasian) |
occupation |
Job title or profession (e.g., professor, actors, engineer, CPA) |
organization |
Name of an organization (e.g., CNN, McDonalds, University of Alaska) |
person_age |
Number associated with an age (e.g., 27, 75) |
person_name |
Name of a person (e.g., Bob, Doug Jones) |
phone_number |
Telephone or fax number |
political_affiliation |
Terms referring to a political party, movement, or ideology (e.g., Republican, Liberal) |
religion |
Terms indicating religious affiliation (e.g., Hindu, Catholic) |
us_social_security_number |
Social Security Number or equivalent |
drivers_license |
Driver’s license number (e.g., DL# 356933-540) |
banking_information |
Banking information, including account and routing numbers |