Skip to main content

Transcript

POST/v2/transcriptCreate a transcript using the parameters described below in the Create a transcript section.
GET/v2/transcript/:idFetch the transcript object you created, using its id.
GET/v2/transcript/:id/sentencesOptionally, after your transcript is complete, get the sentences using its id.
GET/v2/transcript/:id/paragraphsOptionally, after your transcript is complete, get the paragraphs using its id.
GET/v2/transcript/:id/redacted-audioIf you enable PII Redaction, get the redacted audio, as described in the Create a redacted audio file section.
GET/v2/transcript/:id/:subtitle_formatExport your completed transcripts in SRT (srt) or VTT (vtt) format, which can be used for subtitles and closed captions in videos.
GET/v2/transcript/:id/word-search?words=Search through a completed transcript for a specific set of keywords. See Word Search for more info.

Create a transcript

To submit a file for transcription, submit a POST request to the endpoint specified in the table above, using the JSON parameters below to configure the transcript.

You can learn how to submit such a request in the Transcribing an audio file guide.

audio_urlstringThe URL of your media file to transcribe. Learn how to upload local files to obtain a URL in the Transcribing an audio file guide.Yes
language_codestringThe language of your audio file. Possible values are found in Supported Languages. The default value is en_us.
punctuatebooleanEnable Automatic Punctuation, can be true or false
format_textbooleanEnable Text Formatting, can be true or false
dual_channelbooleanEnable Dual Channel transcription, can be true or false
webhook_urlstringThe URL we should send webhooks to when your transcript is complete
webhook_auth_header_namestringDefaults to null. Optionally allows a user to specify a header name and value to send back with a webhook call for added security.
webhook_auth_header_valuestringDefaults to null. Optionally allows a user to specify a header name and value to send back with a webhook call for added security.
auto_highlightsbooleanEnable Key Phrases, either true or false
audio_start_fromintegerThe point in time, in milliseconds, to begin transcription from in your media file
audio_end_atintegerThe point in time, in milliseconds, to stop transcribing in your media file
word_boostarrayA list of custom vocabulary to boost transcription probability for. See Custom vocabulary for more details.
boost_paramstringThe weight to apply to words/phrases in the word_boost array; can be "low", "default", or "high"
filter_profanitybooleanFilter profanity from the transcribed text, can be true or false
redact_piibooleanRedact PII from the transcribed text using the Redact PII model, can be true or false
redact_pii_audiobooleanGenerate a copy of the original media file with spoken PII "beeped" out, can be true or false. See PII Redaction for more details.
redact_pii_audio_qualitystringControls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII Redaction for more details.
redact_pii_policiesarrayThe list of PII redaction policies to enable. See PII Redaction for more details.
redact_pii_substringThe replacement logic for detected PII, can be "entity_type" or "hash". See PII Redaction for more details.
speaker_labelsbooleanEnable Speaker Diarization, can be true or false
speakers_expectedintegerDefaults to null. Tells the speaker label model how many speakers it should attempt to identify, up to 10. See Speaker Diarization for more details.
content_safetybooleanEnable Content Moderation, can be true or false
content_safety_confidenceintegerThe confidence threshold for content moderation. Values must be between 25 and 100. See more details at the Content Moderation model.
iab_categoriesbooleanEnable Topic Detection, can be true or false
language_detectionbooleanWhether Automatic language detection was enabled in the transcription request, either true or false
custom_spellingarrayCustomize how words are spelled and formatted using to and from values. See Custom spelling for more details.
disfluenciesbooleanTranscribe Filler Words, like "umm", in your media file; can be true or false
sentiment_analysisbooleanEnable Sentiment Analysis, can be true or false
auto_chaptersbooleanEnable Auto Chapters, can be true or false
summarizationbooleanEnable Summerization, can be true or false.
If you specify one of summary_model and summary_type, you also need to specify the other.
summary_modelstringThe model to summarize the transcript. See more details at the Summerization model.
summary_typestringThe type of summary. See more details at the Summerization model.
entity_detectionbooleanEnable Entity Detection, can be true or false
speech_thresholdfloatDefaults to null. Reject audio files that contain less than this fraction of speech. Valid values are in the range [0,1] inclusive.

The Transcript Object

After you submit a file for transcription, the corresponding transcript ID is returned in the response. This ID is used in the endpoints for the GET methods listed above, used to fetch transcript information.

When you perform a GET request on the /v2/transcript/:id endpoint, the transcript object is returned in the response JSON for that corresponding ID. The object's parameters/attributes are listed below.

idstringThe unique identifier of your transcription
statusstringThe status of your transcription. queued, processing, completed, or error
language_codestringThe language of your audio file. Possible values are found in Supported Languages. The default value is en_us.
audio_urlstringThe URL of the media that was transcribed
textstringThe textual transcript of your media file
wordsarrayAn array of temporally-sequential word objects, one for each word in the transcript. See Speech Recognition for more information.
utterancesarrayWhen dual_channel or speaker_labels is enabled, a list of turn-by-turn utterance objects. See Speaker Diarization for more information.
confidencefloatThe confidence score for the transcript, between 0.0 (low confidence) and 1.0 (high confidence)
audio_durationfloatThe duration of this transcript object's media file, in seconds
punctuatebooleanWhether Automatic Punctuation was enabled in the transcription request, either true or false
format_textbooleanWhether Text Formatting was enabled in the transcription request, either true or false
dual_channelbooleanWhether Dual channel transcription was enabled in the transcription request, either true or false
webhook_urlstringThe URL to which we send webhooks upon trancription completion, if provided in the transcription request
webhook_status_codenumberThe HTTP status code we received from your server when delivering your webhook, if a webhook URL was provided in the transcription request
webhook_authbooleanWhether webhook authentication details were provided in the transcription request
webhook_auth_header_namestringThe header name which should be sent back with webhook calls, if provided in the transcription request
auto_highlightsbooleanWhether Key Phrases was enabled in the transcription request, either true or false
auto_highlights_resultobjectThe result of the Key Phrases model, if it was enabled during the transcription request. See Key Phrases for more information.
audio_start_fromintegerThe point in time, in milliseconds, in the file at which the transcription was started, if provided in the transcription request
audio_end_atintegerThe point in time, in milliseconds, in the file at which the transcription was terminated, if provided in the transcription request
word_boostarrayThe list of custom vocabulary to boost transcription probability for, if provided in the transcription request
boost_paramstringThe word boost parameter value, if provided in the transcription request
filter_profanitybooleanWhether Profanity Filtering was enabled in the transcription request, either true or false
redact_piibooleanWhether PII Redaction was enabled in the transcription request, either true or false
redact_pii_audiobooleanWhether a redacted version of the audio file was generated (enabled or disabled in the transcription request), either true or false. See PII Redaction for more information.
redact_pii_audio_qualitystringThe audio quality of the PII-redacted audio file, if enabled in the transcription request. See PII Redaction for more information.
redact_pii_policiesarrayThe list of PII redaction policies that were enabled, if PII Redaction is enabled. See PII Redaction for more information.
redact_pii_substringWhich replacement type was used to redact PII. See PII Redaction for more information.
speaker_labelsbooleanWhether Speaker Diarization was enabled in the transcription request, either true or false
speakers_expectedintegerThe value for the speaker_expected parameter in the transcription request, if provided. See Speaker Diarization for more information.
content_safetybooleanWhether Content Moderation was enabled in the transcription request, either true or false
iab_categoriesbooleanWhether Topic Detection was enabled in the transcription request, either true or false
content_safety_labelsarrayAn array of results for the Content Moderation model, if it was enabled during the transcription request. See Content Moderation for more information.
iab_categories_resultobjectThe result of the Topic Detection model, if it was enabled during the transcription request. See Topic Detection for more information.
language_detectionbooleanWhether Automatic language detection was enabled in the transcription request, either true or false
custom_spellingarrayThe custom spelling value passed in to the transcription request, if provided
auto_chaptersbooleanWhether Auto Chapters was enabled in the transcription request, either true or false
summarizationbooleanWhether Summarization was enabled in the transcription request, either true or false
summary_typestringThe type of summary generated, if Summarization was enabled in the transcription request
summary_modelstringThe Summarization model used to generate the summary, if Summarization was enabled in the transcription request
custom_topicsbooleanWhether custom topics was enabled in the transcription request, either true or false
topicsarrayThe list of custom topics provided if custom topics was enabled in the transcription request
speech_thresholdnumberThe value submitted for speech_threshold in the transcription request, if used. Otherwise, null.
disfluenciesbooleanWhether the transcription of disfluences was enabled in the transcription request, either true or false
sentiment_analysisbooleanWhether Sentiment Analysis was enabled in the transcription request, either true or false
sentiment_analysis_resultsarrayAn array of results for the Sentiment Analysis model, if it was enabled during the transcription request. See Sentiment Analysis for more information.
entity_detectionbooleanWhether Entity Detection was enabled in the transcription request, either true or false
entitiesarrayAn array of results for the Entity Detection model, if it was enabled during the transcription request. See Entity Detection for more information.
summarystringThe generated summary of the media file, if Summarization was enabled in the transcription request
throttledbooleanTrue while a request is throttled and false when a request is no longer throttled