April 1, 2020

Automated SRT and VTT Video Captions (April 2020 Update)

Automatically create accurate captions for video files, in SRT and VTT format, using the AssemblyAI Speech-to-Text API.

Joe Zaghloul

Reviewed by

No items found.

Table of contents

[Visible on live site]

This month we added new acoustic models for UK customers, automated video captioning (SRT/VTT), and automatic transcript summaries. Now, companies in industries like video hosting, media monitoring, e-discovery, or video interviewing will be able to improve video playback and search for their customers!

New Model for UK Accented English

State-of-the-Art accuracy is now available for UK Accented English with just a quick change to your API calls. For complete documentation and code samples on how to enable the UK Model, check out our API docs here.

Automated Video Captioning: SRT or VTT export

Now you can easily export your transcription in SRT or VTT format, to be plugged into a video player for subtitles and closed captions. Once your transcript status shows as "completed", you can make a GET request to the following endpoints to export your transcript in VTT or SRT format:

https://api.assemblyai.com/v2/transcript/<your transcript id>/vtt https://api.assemblyai.com/v2/transcript/<your transcript id>/srt

The API will output a plain-text response like this (SRT example):

1 00:00:12,340 --> 00:00:16,380 Last year I showed these two slides said that demonstrate that 2 00:00:16,340 --> 00:00:19,920 the Arctic ice cap which for most of the last 3,000,000 years has been 3 00:00:19,880 --> 00:00:23,120 the size of the lower 48 States has shrunk by 40% ...

Take a look at our API docs to learn more about automatically exporting a transcript in SRT or VTT format here.

Automatic Transcript Highlights

Many of our customers with long forms of audio and video files (e.g. webinars, podcasts, conference calls, video interviews) were looking for ways to make their transcriptions easier to review more quickly. In addition, they wanted to be able to tag these calls immediately depending on the most important key phrases.

That's where auto transcript highlights come in. We can now detect key phrases in your transcripts using Natural Language Processing (NLP) to help with features like:

Summarize transcription text: Simplify long transcriptions to only highlight the most common keywords and phrases
Auto-tagging/indexing: Make your entire file searchable by adding in auto-highlights to each file as a searchable tag

Take the following sample transcription, for example:

Hi I'm joy. Hi I'm Sharon. Do you have kids in school? I have grandchildren in school. Okay, well, my kids are in middle school in high school. Do you think there is anything wrong with the school system? Overcrowding, of course, ...

In this example, the following phrases and terms would be automatically detected by the Auto Highlights feature:

"high school", "middle school", "kids"

Below is a code sample showing how to turn on automatic transcript highlights in Python, for code samples in other languages, check out our API docs here.

import requests endpoint = "https://api.assemblyai.com/v2/transcript" json = { "audio_url": "https://app.assemblyai.com/static/media/phone_demo_clip_1.wav", "auto_highlights": True } headers = { "authorization": "YOUR-API-TOKEN", "content-type": "application/json" } response = requests.post(endpoint, json=json, headers=headers) print(response.json())

100% Uptime

Another month of 100% uptime across all our models, subscribe to our status page to stay up-to-date!

Automated SRT and VTT Video Captions (April 2020 Update)

New Model for UK Accented English

Automated Video Captioning: SRT or VTT export

Automatic Transcript Highlights

100% Uptime

How to Set Up Twilio Voicemail

RLHF vs RLAIF for language model alignment

Content moderation on audio files with Python

Ollang scales media localization with 76% less manual processing after integrating Speech AI

Automated SRT and VTT Video Captions (April 2020 Update)

New Model for UK Accented English

Automated Video Captioning: SRT or VTT export

Automatic Transcript Highlights

100% Uptime

Related posts

How to Set Up Twilio Voicemail

RLHF vs RLAIF for language model alignment

Content moderation on audio files with Python

Ollang scales media localization with 76% less manual processing after integrating Speech AI