Skip to main content

Entity Detection

The Entity Detection model is a powerful tool that can automatically identify and categorize key information in audio content transcribed, such as the names of people, organizations, addresses, phone numbers, medical data, social security numbers, and more.

Quickstart

When submitting files for transcription, include the entity_detection parameter in your request body and set it to true.

You can also view the transcription source code here.

All entities supported by the model

The model is designed to automatically detect and classify various types of entities within the transcription text. The detected entities and their corresponding types will be listed individually in the entities key of the response object, ordered by when they first appear in the transcript.

blood_typeBlood type (e.g., O-, AB positive)
credit_card_cvvCredit card verification code (e.g., CVV: 080)
credit_card_expirationExpiration date of a credit card
credit_card_numberCredit card number
dateSpecific calendar date (e.g., December 18)
date_of_birthDate of Birth (e.g., Date of Birth: March 7, 1961)
drugMedications, vitamins, or supplements (e.g., Advil, Acetaminophen, Panadol)
eventName of an event or holiday (e.g., Olympics, Yom Kippur)
email_addressEmail address (e.g., support@assemblyai.com)
injuryBodily injury (e.g., I broke my arm, I have a sprained wrist)
languageName of a natural language (e.g., Spanish, French)
locationAny location reference including mailing address, postal code, city, state, province, or country
medical_conditionName of a medical condition, disease, syndrome, deficit, or disorder (e.g., chronic fatigue syndrome, arrhythmia, depression)
medical_processMedical process, including treatments, procedures, and tests (e.g., heart surgery, CT scan)
money_amountName and/or amount of currency (e.g., 15 pesos, $94.50)
nationalityTerms indicating nationality, ethnicity, or race (e.g., American, Asian, Caucasian)
occupationJob title or profession (e.g., professor, actors, engineer, CPA)
organizationName of an organization (e.g., CNN, McDonalds, University of Alaska)
person_ageNumber associated with an age (e.g., 27, 75)
person_nameName of a person (e.g., Bob, Doug Jones)
phone_numberTelephone or fax number
political_affiliationTerms referring to a political party, movement, or ideology (e.g., Republican, Liberal)
religionTerms indicating religious affiliation (e.g., Hindu, Catholic)
us_social_security_numberSocial Security Number or equivalent
drivers_licenseDriver's license number (e.g., DL #356933-540)
banking_informationBanking information, including account and routing numbers

Understanding the response

The identified entities are stored within the entities key, which contains an array of objects, one for each entity identified within the transcribed text. Each dictionary includes information such as the type of entity, start and end timestamps, and the corresponding text. The text key can be used to retrieve the specific entity identified.

# Entity: Ted Talks
# Type: event
#
# Entity: Ted Conference
# Type: event
#
# Entity: Dan Gilbert
# Type: person
#
# Entity: psychologist
# Type: occupation

Troubleshooting

How does the Entity Detection model handle misspellings or variations of entities?

The model is capable of identifying entities with variations in spelling or formatting. However, the accuracy of the detection may depend on the severity of the variation or misspelling.

Can the Entity Detection model identify custom entity types?

No, the Entity Detection model currently does not support the detection of custom entity types. However, the model is capable of detecting a wide range of predefined entity types, including people, organizations, locations, dates, times, addresses, phone numbers, medical data, and banking information, among others.

How can I improve the accuracy of the Entity Detection model?

To improve the accuracy of the Entity Detection model, it is recommended to provide high-quality audio files with clear and distinct speech. In addition, it is important to ensure that the audio content is relevant to the use case and that the entities being detected are relevant to the intended analysis. Finally, it may be helpful to review and adjust the model's configuration parameters, such as the confidence threshold for entity detection, to optimize the results.