Transcript
POST | /v2/transcript | Create a transcript using the parameters described below in the Create a transcript section. |
GET | /v2/transcript/:id | Fetch the transcript object you created, using its id . |
GET | /v2/transcript/:id/sentences | Optionally, after your transcript is complete, get the sentences using its id . |
GET | /v2/transcript/:id/paragraphs | Optionally, after your transcript is complete, get the paragraphs using its id . |
GET | /v2/transcript/:id/redacted-audio | If you enable PII Redaction, get the redacted audio, as described in the Create a redacted audio file section. |
GET | /v2/transcript/:id/:subtitle_format | Export your completed transcripts in SRT (srt ) or VTT (vtt ) format, which can be used for subtitles and closed captions in videos. |
GET | /v2/transcript/:id/word-search?words= | Search through a completed transcript for a specific set of keywords. See Word Search for more info. |
Create a transcript
To submit a file for transcription, submit a POST request to the endpoint specified in the table above, using the JSON parameters below to configure the transcript.
You can learn how to submit such a request in the Transcribing an audio file guide.
audio_url | string | The URL of your media file to transcribe. Learn how to upload local files to obtain a URL in the Transcribing an audio file guide. | Yes |
language_code | string | The language of your audio file. Possible values are found in Supported Languages. The default value is en_us . | |
punctuate | boolean | Enable Automatic Punctuation, can be true or false | |
format_text | boolean | Enable Text Formatting, can be true or false | |
dual_channel | boolean | Enable Dual Channel transcription, can be true or false | |
webhook_url | string | The URL we should send webhooks to when your transcript is complete | |
webhook_auth_header_name | string | Defaults to null . Optionally allows a user to specify a header name and value to send back with a webhook call for added security. | |
webhook_auth_header_value | string | Defaults to null . Optionally allows a user to specify a header name and value to send back with a webhook call for added security. | |
auto_highlights | boolean | Enable Key Phrases, either true or false | |
audio_start_from | integer | The point in time, in milliseconds, to begin transcription from in your media file | |
audio_end_at | integer | The point in time, in milliseconds, to stop transcribing in your media file | |
word_boost | array | A list of custom vocabulary to boost transcription probability for. See Custom vocabulary for more details. | |
boost_param | string | The weight to apply to words/phrases in the word_boost array; can be "low" , "default" , or "high" | |
filter_profanity | boolean | Filter profanity from the transcribed text, can be true or false | |
redact_pii | boolean | Redact PII from the transcribed text using the Redact PII model, can be true or false | |
redact_pii_audio | boolean | Generate a copy of the original media file with spoken PII "beeped" out, can be true or false . See PII Redaction for more details. | |
redact_pii_audio_quality | string | Controls the filetype of the audio created by redact_pii_audio . Currently supports mp3 (default) and wav . See PII Redaction for more details. | |
redact_pii_policies | array | The list of PII redaction policies to enable. See PII Redaction for more details. | |
redact_pii_sub | string | The replacement logic for detected PII, can be "entity_type" or "hash" . See PII Redaction for more details. | |
speaker_labels | boolean | Enable Speaker Diarization, can be true or false | |
speakers_expected | integer | Defaults to null . Tells the speaker label model how many speakers it should attempt to identify, up to 10. See Speaker Diarization for more details. | |
content_safety | boolean | Enable Content Moderation, can be true or false | |
content_safety_confidence | integer | The confidence threshold for content moderation. Values must be between 25 and 100. See more details at the Content Moderation model. | |
iab_categories | boolean | Enable Topic Detection, can be true or false | |
language_detection | boolean | Whether Automatic language detection was enabled in the transcription request, either true or false | |
custom_spelling | array | Customize how words are spelled and formatted using to and from values. See Custom spelling for more details. | |
disfluencies | boolean | Transcribe Filler Words, like "umm", in your media file; can be true or false | |
sentiment_analysis | boolean | Enable Sentiment Analysis, can be true or false | |
auto_chapters | boolean | Enable Auto Chapters, can be true or false | |
summarization | boolean | Enable Summerization, can be true or false . If you specify one of summary_model and summary_type , you also need to specify the other. | |
summary_model | string | The model to summarize the transcript. See more details at the Summerization model. | |
summary_type | string | The type of summary. See more details at the Summerization model. | |
entity_detection | boolean | Enable Entity Detection, can be true or false | |
speech_threshold | float | Defaults to null . Reject audio files that contain less than this fraction of speech. Valid values are in the range [0,1] inclusive. |
The Transcript Object
After you submit a file for transcription, the corresponding transcript ID is returned in the response. This ID is used in the endpoints for the GET methods listed above, used to fetch transcript information.
When you perform a GET request on the /v2/transcript/:id
endpoint, the transcript object is returned in the response JSON for that corresponding ID. The object's parameters/attributes are listed below.
id | string | The unique identifier of your transcription |
status | string | The status of your transcription. queued , processing , completed , or error |
language_code | string | The language of your audio file. Possible values are found in Supported Languages. The default value is en_us . |
audio_url | string | The URL of the media that was transcribed |
text | string | The textual transcript of your media file |
words | array | An array of temporally-sequential word objects, one for each word in the transcript. See Speech Recognition for more information. |
utterances | array | When dual_channel or speaker_labels is enabled, a list of turn-by-turn utterance objects. See Speaker Diarization for more information. |
confidence | float | The confidence score for the transcript, between 0.0 (low confidence) and 1.0 (high confidence) |
audio_duration | float | The duration of this transcript object's media file, in seconds |
punctuate | boolean | Whether Automatic Punctuation was enabled in the transcription request, either true or false |
format_text | boolean | Whether Text Formatting was enabled in the transcription request, either true or false |
dual_channel | boolean | Whether Dual channel transcription was enabled in the transcription request, either true or false |
webhook_url | string | The URL to which we send webhooks upon trancription completion, if provided in the transcription request |
webhook_status_code | number | The HTTP status code we received from your server when delivering your webhook, if a webhook URL was provided in the transcription request |
webhook_auth | boolean | Whether webhook authentication details were provided in the transcription request |
webhook_auth_header_name | string | The header name which should be sent back with webhook calls, if provided in the transcription request |
auto_highlights | boolean | Whether Key Phrases was enabled in the transcription request, either true or false |
auto_highlights_result | object | The result of the Key Phrases model, if it was enabled during the transcription request. See Key Phrases for more information. |
audio_start_from | integer | The point in time, in milliseconds, in the file at which the transcription was started, if provided in the transcription request |
audio_end_at | integer | The point in time, in milliseconds, in the file at which the transcription was terminated, if provided in the transcription request |
word_boost | array | The list of custom vocabulary to boost transcription probability for, if provided in the transcription request |
boost_param | string | The word boost parameter value, if provided in the transcription request |
filter_profanity | boolean | Whether Profanity Filtering was enabled in the transcription request, either true or false |
redact_pii | boolean | Whether PII Redaction was enabled in the transcription request, either true or false |
redact_pii_audio | boolean | Whether a redacted version of the audio file was generated (enabled or disabled in the transcription request), either true or false . See PII Redaction for more information. |
redact_pii_audio_quality | string | The audio quality of the PII-redacted audio file, if enabled in the transcription request. See PII Redaction for more information. |
redact_pii_policies | array | The list of PII redaction policies that were enabled, if PII Redaction is enabled. See PII Redaction for more information. |
redact_pii_sub | string | Which replacement type was used to redact PII. See PII Redaction for more information. |
speaker_labels | boolean | Whether Speaker Diarization was enabled in the transcription request, either true or false |
speakers_expected | integer | The value for the speaker_expected parameter in the transcription request, if provided. See Speaker Diarization for more information. |
content_safety | boolean | Whether Content Moderation was enabled in the transcription request, either true or false |
iab_categories | boolean | Whether Topic Detection was enabled in the transcription request, either true or false |
content_safety_labels | array | An array of results for the Content Moderation model, if it was enabled during the transcription request. See Content Moderation for more information. |
iab_categories_result | object | The result of the Topic Detection model, if it was enabled during the transcription request. See Topic Detection for more information. |
language_detection | boolean | Whether Automatic language detection was enabled in the transcription request, either true or false |
custom_spelling | array | The custom spelling value passed in to the transcription request, if provided |
auto_chapters | boolean | Whether Auto Chapters was enabled in the transcription request, either true or false |
summarization | boolean | Whether Summarization was enabled in the transcription request, either true or false |
summary_type | string | The type of summary generated, if Summarization was enabled in the transcription request |
summary_model | string | The Summarization model used to generate the summary, if Summarization was enabled in the transcription request |
custom_topics | boolean | Whether custom topics was enabled in the transcription request, either true or false |
topics | array | The list of custom topics provided if custom topics was enabled in the transcription request |
speech_threshold | number | The value submitted for speech_threshold in the transcription request, if used. Otherwise, null . |
disfluencies | boolean | Whether the transcription of disfluences was enabled in the transcription request, either true or false |
sentiment_analysis | boolean | Whether Sentiment Analysis was enabled in the transcription request, either true or false |
sentiment_analysis_results | array | An array of results for the Sentiment Analysis model, if it was enabled during the transcription request. See Sentiment Analysis for more information. |
entity_detection | boolean | Whether Entity Detection was enabled in the transcription request, either true or false |
entities | array | An array of results for the Entity Detection model, if it was enabled during the transcription request. See Entity Detection for more information. |
summary | string | The generated summary of the media file, if Summarization was enabled in the transcription request |
throttled | boolean | True while a request is throttled and false when a request is no longer throttled |