Transcript API reference
Create, retrieve, and delete transcripts.
Transcript object
The transcript resource is a JSON object that may contain several of the following properties, depending on the parameters used when the transcript was created.
Below you can find all available properties on the transcript object.
id | string | The unique identifier of your transcription |
status | string | The status of your transcription. queued , processing , completed , or error |
language_code | string | The language of your audio file. Possible values are found in Supported languages. The default value is en_us . |
audio_url | string | The URL of the media that was transcribed |
text | string | The textual transcript of your media file |
words | array | An array of temporally-sequential word objects, one for each word in the transcript. See Speech Recognition for more information. |
utterances | array | When dual_channel or speaker_labels is enabled, a list of turn-by-turn utterance objects. See Speaker Diarization for more information. |
confidence | float | The confidence score for the transcript, between 0.0 (low confidence) and 1.0 (high confidence) |
audio_duration | float | The duration of this transcript object's media file, in seconds |
punctuate | boolean | Whether Automatic Punctuation was enabled in the transcription request, either true or false |
format_text | boolean | Whether Text Formatting was enabled in the transcription request, either true or false |
dual_channel | boolean | Whether Dual channel transcription was enabled in the transcription request, either true or false |
speech_model | string | The speech model used for the transcription, if a speech model was specified. |
webhook_url | string | The URL to which we send webhooks upon trancription completion, if provided in the transcription request |
webhook_status_code | number | The HTTP status code we received from your server when delivering your webhook, if a webhook URL was provided in the transcription request |
webhook_auth | boolean | Whether webhook authentication details were provided in the transcription request |
webhook_auth_header_name | string | The header name which should be sent back with webhook calls, if provided in the transcription request |
auto_highlights | boolean | Whether Key Phrases was enabled in the transcription request, either true or false |
auto_highlights_result | object | The result of the Key Phrases model, if it was enabled during the transcription request. See Key Phrases for more information. |
audio_start_from | integer | The point in time, in milliseconds, in the file at which the transcription was started, if provided in the transcription request |
audio_end_at | integer | The point in time, in milliseconds, in the file at which the transcription was terminated, if provided in the transcription request |
word_boost | array | The list of custom vocabulary to boost transcription probability for, if provided in the transcription request |
boost_param | string | The word boost parameter value, if provided in the transcription request |
filter_profanity | boolean | Whether Profanity Filtering was enabled in the transcription request, either true or false |
redact_pii | boolean | Whether PII Redaction was enabled in the transcription request, either true or false |
redact_pii_audio | boolean | Whether a redacted version of the audio file was generated (enabled or disabled in the transcription request), either true or false . See PII Redaction for more information. |
redact_pii_audio_quality | string | The audio quality of the PII-redacted audio file, if enabled in the transcription request. See PII Redaction for more information. |
redact_pii_policies | array | The list of PII redaction policies that were enabled, if PII Redaction is enabled. See PII Redaction for more information. |
redact_pii_sub | string | Which replacement type was used to redact PII. See PII Redaction for more information. |
speaker_labels | boolean | Whether Speaker Diarization was enabled in the transcription request, either true or false |
speakers_expected | integer | The value for the speaker_expected parameter in the transcription request, if provided. See Speaker Diarization for more information. |
content_safety | boolean | Whether Content Moderation was enabled in the transcription request, either true or false |
iab_categories | boolean | Whether Topic Detection was enabled in the transcription request, either true or false |
content_safety_labels | object | The results of the Content Moderation model, if it was enabled during the transcription request. See Content Moderation for more information. |
iab_categories_result | object | The result of the Topic Detection model, if it was enabled during the transcription request. See Topic Detection for more information. |
language_detection | boolean | Whether Automatic language detection was enabled in the transcription request, either true or false |
custom_spelling | array | The custom spelling value passed in to the transcription request, if provided |
auto_chapters | boolean | Whether Auto Chapters was enabled in the transcription request, either true or false |
summarization | boolean | Whether Summarization was enabled in the transcription request, either true or false |
summary_type | string | The type of summary generated, if Summarization was enabled in the transcription request |
summary_model | string | The Summarization model used to generate the summary, if Summarization was enabled in the transcription request |
custom_topics | boolean | Whether custom topics was enabled in the transcription request, either true or false |
topics | array | The list of custom topics provided if custom topics was enabled in the transcription request |
speech_threshold | number | The value submitted for speech_threshold in the transcription request, if used. Otherwise, null . |
disfluencies | boolean | Whether the transcription of disfluences was enabled in the transcription request, either true or false |
sentiment_analysis | boolean | Whether Sentiment Analysis was enabled in the transcription request, either true or false |
sentiment_analysis_results | array | An array of results for the Sentiment Analysis model, if it was enabled during the transcription request. See Sentiment Analysis for more information. |
entity_detection | boolean | Whether Entity Detection was enabled in the transcription request, either true or false |
entities | array | An array of results for the Entity Detection model, if it was enabled during the transcription request. See Entity Detection for more information. |
summary | string | The generated summary of the media file, if Summarization was enabled in the transcription request |
throttled | boolean | True while a request is throttled and false when a request is no longer throttled |
Create a transcript
Create a transcript from an audio or video file that is accessible via a URL.
curl --request POST \
--url "https://api.assemblyai.com/v2/transcript" \
--header "Authorization: YOUR_API_KEY" \
--data '{
"audio_url": "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3",
"speaker_labels": true,
"auto_chapters": true
}'
Body parameters
audio_url | string | The URL of your media file to transcribe. Learn how to upload local files to obtain a URL in the Transcribing an audio file guide. | Yes |
language_code | string | The language of your audio file. Possible values are found in Supported languages. The default value is en_us . | |
punctuate | boolean | Enable Automatic Punctuation, can be true or false | |
format_text | boolean | Enable Text Formatting, can be true or false | |
dual_channel | boolean | Enable Dual Channel transcription, can be true or false | |
speech_model | string | The speech model to use for the transcription. See Select the speech model. | |
webhook_url | string | The URL we should send webhooks to when your transcript is complete | |
webhook_auth_header_name | string | Defaults to null . Optionally allows a user to specify a header name and value to send back with a webhook call for added security. | |
webhook_auth_header_value | string | Defaults to null . Optionally allows a user to specify a header name and value to send back with a webhook call for added security. | |
auto_highlights | boolean | Enable Key Phrases, either true or false | |
audio_start_from | integer | The point in time, in milliseconds, to begin transcription from in your media file | |
audio_end_at | integer | The point in time, in milliseconds, to stop transcribing in your media file | |
word_boost | array | A list of custom vocabulary to boost transcription probability for. See Custom vocabulary for more details. | |
boost_param | string | The weight to apply to words/phrases in the word_boost array; can be "low" , "default" , or "high" | |
filter_profanity | boolean | Filter profanity from the transcribed text, can be true or false | |
redact_pii | boolean | Redact PII from the transcribed text using the Redact PII model, can be true or false | |
redact_pii_audio | boolean | Generate a copy of the original media file with spoken PII "beeped" out, can be true or false . See PII Redaction for more details. | |
redact_pii_audio_quality | string | Controls the filetype of the audio created by redact_pii_audio . Currently supports mp3 (default) and wav . See PII Redaction for more details. | |
redact_pii_policies | array | The list of PII redaction policies to enable. See PII Redaction for more details. | |
redact_pii_sub | string | The replacement logic for detected PII, can be "entity_type" or "hash" . See PII Redaction for more details. | |
speaker_labels | boolean | Enable Speaker Diarization, can be true or false | |
speakers_expected | integer | Defaults to null . Tells the speaker label model how many speakers it should attempt to identify, up to 10. See Speaker Diarization for more details. | |
content_safety | boolean | Enable Content Moderation, can be true or false | |
content_safety_confidence | integer | The confidence threshold for content moderation. Values must be between 25 and 100. See more details at the Content Moderation model. | |
iab_categories | boolean | Enable Topic Detection, can be true or false | |
language_detection | boolean | Whether Automatic language detection was enabled in the transcription request, either true or false | |
custom_spelling | array | Customize how words are spelled and formatted using to and from values. See Custom spelling for more details. | |
disfluencies | boolean | Transcribe Filler Words, like "umm", in your media file; can be true or false | |
sentiment_analysis | boolean | Enable Sentiment Analysis, can be true or false | |
auto_chapters | boolean | Enable Auto Chapters, can be true or false | |
summarization | boolean | Enable Summerization, can be true or false . If you specify one of summary_model and summary_type , you also need to specify the other. | |
summary_model | string | The model to summarize the transcript. See more details at the Summerization model. | |
summary_type | string | The type of summary. See more details at the Summerization model. | |
entity_detection | boolean | Enable Entity Detection, can be true or false | |
speech_threshold | float | Defaults to null . Reject audio files that contain less than this fraction of speech. Valid values are in the range [0,1] inclusive. |
Response
A successful response has a 200
status code and a application/json
Content-Type.
The response contains a Transcript object.
Get a transcript
Get the transcript resource. The transcript is ready when the status
is set to "completed".
curl --url "https://api.assemblyai.com/v2/transcript/:id" \
--header "Authorization: YOUR_API_KEY"
Path parameters
id | ID of the transcript. |
Response
A successful response has a 200
status code and a application/json
Content-Type.
The response contains a Transcript object.
Get sentences in a transcript
Get the transcript split by sentences. The API will attempt to semantically segment the transcript into sentences to create more reader-friendly transcripts.
curl --url "https://api.assemblyai.com/v2/transcript/:id/sentences" \
--header "Authorization: YOUR_API_KEY"
Path parameters
id | ID of the transcript. |
Response
A successful response has a 200
status code and a application/json
Content-Type.
id | string |
confidence | number |
audio_duration | number |
sentences | array |
Example response
Get paragraphs in a transcript
Get the transcript split by paragraphs. The API will attempt to semantically segment your transcript into paragraphs to create more reader-friendly transcripts.
curl --url "https://api.assemblyai.com/v2/transcript/:id/paragraphs" \
--header "Authorization: YOUR_API_KEY"
Path parameters
id | ID of the transcript. |
Response
A successful response has a 200
status code and a application/json
Content-Type.
id | string |
confidence | number |
audio_duration | number |
paragraphs | array |
Example response
Get redacted audio for a transcript
Retrieve the redacted audio object containing the status and URL to the redacted audio.
curl --url "https://api.assemblyai.com/v2/transcript/:id/redacted-audio" \
--header "Authorization: YOUR_API_KEY"
Path parameters
id | ID of the transcript. |
Response
A successful response has a 200
status code and a application/json
Content-Type.
status | string | The status of the redacted audio |
redacted_audio_url | string | The URL of the redacted audio file |
Example response
Get subtitles for a transcript
Export your transcript in SRT or VTT format, to be plugged into a video player for subtitles and closed captions.
curl --url "https://api.assemblyai.com/v2/transcript/:id/:subtitle_format" \
--header "Authorization: YOUR_API_KEY"
Path parameters
id | ID of the transcript. |
subtitle_format | The format of the captions. Can be srt or vtt |
Query parameters
chars_per_caption | The maximum number of characters per caption |
Response
A successful response has a 200
status code and a text/html
Content-Type.
The response contains an SRT or VTT file.
Search for words in a transcript
Search through the transcript for a specific set of keywords. You can search for individual words, numbers, or phrases containing up to five words or numbers.
curl --url "https://api.assemblyai.com/v2/transcript/:id/word-search?words=hopkins,wildfires" \
--header "Authorization: YOUR_API_KEY"
Path parameters
id | ID of the transcript. |
Query parameters
words | string | Comma-separated list of keywords to search for. |
Response
A successful response has a 200
status code and a application/json
Content-Type.
List transcripts
Retrieves a paginated list of transcripts you've created.
curl --request GET \
--url https://api.assemblyai.com/v2/transcript?limit=10&status=completed \
--header 'Authorization: YOUR_API_KEY'
Query parameters
limit | Limits the number of results to return in a single response | Must be between 1 and 200, inclusive, with a default value of 10 |
status | Filters transcripts by their status | Must be one of the following values: queued , processing , completed , or error |
created_on | Returns only transcripts that were created on a specific date | Must be in the format YYYY-MM-DD |
before_id | Returns transcripts that were created before a specific transcript ID | Must be a valid transcript ID |
after_id | Returns transcripts that were created after a specific transcript ID | Must be a valid transcript ID |
throttled_only | Returns only throttled transcripts, regardless of their status | Must be either true or false |
Response
A successful response has a 200
status code and a application/json
Content-Type.
A successful response is paginated, with a maximum of 200 transcripts per page.
Since transcripts are sorted from newest to oldest, prev_url
always points to a page with older transcripts.
transcripts[i].id | The ID of the transcript. |
transcripts[i].resource_url | The URL to fetch the complete information for this transcript. |
transcripts[i].status | The current status of the transcript. |
transcripts[i].created | The date and time the transcript was created. |
transcripts[i].completed | The date and time the transcript was completed, if applicable. |
transcripts[i].audio_url | The audio URL that was submitted in the initial POST request when creating the transcript. |
Example response
If you delete a transcript, the audio URL will no longer be available via the historical endpoint, and the audio_url
key will show "deleted by user".
Delete a transcript
By default, AssemblyAI doesn't store a copy of the files you submit to the API for transcription. However, the transcription itself is stored in our encrypted database so that we can serve it to you and your application.
Once a transcript is deleted, all sensitive information associated with it'll be permanently deleted from our system. However, certain metadata such as the transcript ID and audio duration remain stored for billing purposes.
Exercise caution when deleting transcripts, as this action can't be undone. If you have any questions, please contact our support team for assistance.
curl https://api.assemblyai.com/v2/transcript/:id \
--request DELETE \
--header 'Authorization: YOUR_API_KEY'
Path parameters
id | ID of the transcript. |
Response
A successful response has a 200
status code and a application/json
Content-Type.
The response contains a Transcript object.