Entity Detection
The Entity Detection model is a powerful tool that can automatically identify and categorize key information in audio content transcribed, such as the names of people, organizations, addresses, phone numbers, medical data, social security numbers, and more.
Quickstart
When submitting files for transcription, include the entity_detection
parameter in your request body and set it to true
.
You can explore the full JSON response here:
Show JSON
You run this code snippet in Colab here, or you can view the full source code here.
Understanding the response
The JSON object above contains all information about the transcription. Depending on which Models are used to analyze the audio, the attributes of this object will vary. For example, in the quickstart above we did not enable Summarization, which is reflected by the summarization: false
key-value pair in the JSON above. Had we activated Summarization, then the summary
, summary_type
, and summary_model
key values would contain the file summary (and additional details) rather than the current null
values.
To access the Entity Detection information, we use the entity_detection
and entities
keys:
The reference table below lists all relevant attributes along with their descriptions, where we've called the JSON response object results
. Object attributes are accessed via dot notation, and arbitrary array elements are denoted with [i]
.
For example, results.words[i].text
refers to the text
attribute of the i-th
element of the words
array in the JSON results
object.
results.entity_detection | boolean | Whether Entity Detection was enabled in the transcription request |
results.entities | array | An array of detected entities |
results.entities[i].entity_type | string | The type of entity for the i-th detected entity |
results.entities[i].text | string | The text for the i-th detected entity |
results.entities[i].start | number | The starting time, in milliseconds, at which the i-th detected entity appears in the audio file |
results.entities[i].end | number | The ending time, in milliseconds, for the i-th detected entity in the audio file |
All entities supported by the model
The model is designed to automatically detect and classify various types of entities within the transcription text. The detected entities and their corresponding types is listed individually in the entities key of the response object, ordered by when they first appear in the transcript.
banking_information | Banking information, including account and routing numbers |
blood_type | Blood type (e.g., O-, AB positive) |
credit_card_cvv | Credit card verification code (e.g., CVV: 080) |
credit_card_expiration | Expiration date of a credit card |
credit_card_number | Credit card number |
date | Specific calendar date (e.g., December 18) |
date_of_birth | Date of Birth (e.g., Date of Birth: March 7, 1961) |
drivers_license | Driver's license number (e.g., DL #356933-540) |
drug | Medications, vitamins, or supplements (e.g., Advil, Acetaminophen, Panadol) |
email_address | Email address (e.g., support@assemblyai.com) |
event | Name of an event or holiday (e.g., Olympics, Yom Kippur) |
injury | Bodily injury (e.g., I broke my arm, I have a sprained wrist) |
language | Name of a natural language (e.g., Spanish, French) |
location | Any location reference including mailing address, postal code, city, state, province, or country |
medical_condition | Name of a medical condition, disease, syndrome, deficit, or disorder (e.g., chronic fatigue syndrome, arrhythmia, depression) |
medical_process | Medical process, including treatments, procedures, and tests (e.g., heart surgery, CT scan) |
money_amount | Name and/or amount of currency (e.g., 15 pesos, $94.50) |
nationality | Terms indicating nationality, ethnicity, or race (e.g., American, Asian, Caucasian) |
occupation | Job title or profession (e.g., professor, actors, engineer, CPA) |
organization | Name of an organization (e.g., CNN, McDonalds, University of Alaska) |
password | Account passwords, PINs, access keys, or verification answers (e.g., 27%alfalfa, temp1234, My mother's maiden name is Smith) |
person_age | Number associated with an age (e.g., 27, 75) |
person_name | Name of a person (e.g., Bob, Doug Jones) |
phone_number | Telephone or fax number |
political_affiliation | Terms referring to a political party, movement, or ideology (e.g., Republican, Liberal) |
religion | Terms indicating religious affiliation (e.g., Hindu, Catholic) |
time | Expressions indicating clock times (e.g., 19:37:28, 10pm EST) |
url | Internet addresses (e.g., www.assemblyai.com) |
us_social_security_number | Social Security Number or equivalent |
Troubleshooting
The model is capable of identifying entities with variations in spelling or formatting. However, the accuracy of the detection may depend on the severity of the variation or misspelling.
No, the Entity Detection model currently doesn't support the detection of custom entity types. However, the model is capable of detecting a wide range of predefined entity types, including people, organizations, locations, dates, times, addresses, phone numbers, medical data, and banking information, among others.
To improve the accuracy of the Entity Detection model, it's recommended to provide high-quality audio files with clear and distinct speech. In addition, it's important to ensure that the audio content is relevant to the use case and that the entities being detected are relevant to the intended analysis. Finally, it may be helpful to review and adjust the model's configuration parameters, such as the confidence threshold for entity detection, to optimize the results.