Announcements

New AI Models to summarize audio and video for any use case

We are introducing several new Summarization models, each tailored to a specific use case.

New AI Models to summarize audio and video for any use case

Today at AssemblyAI, we are excited to announce the release of several new Summarization AI models:

The new models are:

  1. Informative which is best for files with a single speaker, like a presentation or lecture
  2. Conversational which is best for any multi-person conversation, like customer/agent phone calls or interview/interviewee calls
  3. Catchy which is best for creating video, podcast, or media titles

Having been trained on data relevant to a specific use case, each model provides state-of-the-art results for that particular use case. Each model also supports various summary lengths, allowing product teams to further tailor the summaries to their specific use case.

The new AI models in action

Let’s take a look at each of these new AI models in turn.

Informative

The Informative summary model is best for audio in which a single person is speaking, like in a presentation or lecture. Below we can see the ground truth transcript for a news segment along with the summary generated by the Informative summary model:

Summary

A train heading from Queens into Manhattan was stalled underneath the East River around 8.30am Monday morning. Part of the train's contact shoe is thought to have touched the board instead of the rail, sparking the incident. More than 500 passengers were taken to Manhattan after spending roughly an hour and a half trapped. Service on the 7 line was suspended for almost two hours.

Ground-truth transcript

Hundreds of passengers on a New York City subway train were evacuated from cars in an underwater tunnel after a fire during the Monday morning commute. A train heading from Queens into Manhattan was stalled underneath the East River around 8.30am Monday morning and its conductor saw smoke coming from the board along the track's third rail. The train's 542 passengers were taken by a rescue train to Grand Central Station in Manhattan, an MTA spokesman told Daily Mail Online. Scroll down for video . Service on a New York City subway line (pictured) was suspended for almost two hours after smoke in an underwater tunnel left hundreds of passengers stuck beneath the East River . More than 500 passengers were taken to Manhattan after spending roughly an hour and a half trapped. They walked through their stalled train into a rescue train and left around 10am . Part of the train's contact shoe, which gets power from the third rail, is thought to have touched the board instead of the rail, sparking the incident that left service on the 7 train suspended for just less than two hours. The last of the passengers were taken on to the new train around 10am, according to AMNY. No injuries were reported beyond a woman who felt faint and requested medical attention. The MTA has warned passengers to expect delays on other lines such as the N,Q, and R. More than 500 passengers were taken to Manhattan after spending roughly an hour and a half trapped beneath the East River. Above, firefighters seen at Grand Central Station . Commuters faced delays and crowds of people as they tried to travel from Queens into Manhattan. Service resumed around 10.30am after the incident, believed to be caused by a train's conduct with a safety board . Commuters trips on the 7 line were disrupted, with some taking unusual transport methods such as boats to get to work. Residents of Queens have recently complained about what they view as particularly poor service on the 7, which goes through the heart of their borough. A rally was held last month calling for less delays on the line after a winter of outages, according to DNA Info.

Conversational

The Conversational summary model is best for audio in which 2 or more speakers are having a conversation. Below we see the ground truth transcript for an interview along with the summary generated by the Conversational summary model:

Summary

Mary Brown comes to Mister Thompson to apply for a secretary. She tells Mister Thompson she can do everything a secretary is expected to do and she expects a salary of around $800 a month. Mister Thompson will let her know the result as soon as possible.

Ground-truth transcript

Speaker A: Good morning, Mister Thompson. My name is Mary Brown.

Speaker B: Good morning, Miss Brown, take a seat please.

Speaker A: Thank you.

Speaker B: Well Miss Brown, could you please tell me about yourself?

Speaker A: Yes, of course. I'm 18 years old and just graduated from Peterson Secretary school. I read your ad. in the newspaper and I know that you're looking for a secretary.

Speaker B: Could you please tell me what you can do?

Speaker A: I can do whatever a secretary is expected to do, such as typing receiving phone calls, sending faxes or writing reports.

Speaker B: Well, it seems that your qualifications for the job are excellent. Could you tell me what kind of salary you're expecting?

Speaker A: I saw in the ad. that this position offers a salary of around $800 a month.

Speaker B: That's right.

Speaker A: That would be fine with me.

Speaker B: Is there anything you would like to know about the job?

Speaker A: No, not so far.

Speaker B: Good. Thank you for coming Miss Brown. I've enjoyed meeting and talking with you, we'll let you know the result as early as possible.

Speaker A: I appreciate the time you have given me.

Catchy

The Catchy summary model is best for automatically generating taglines, headlines, etc. Below we see the ground truth transcript for a story about scientists discovering gravitational waves along with the summary generated by the Catchy summary model:

Summary

Scientists Find Gravitational Waves for the First Time (New Study)

Ground-truth transcript

After decades of searching, scientists announced Thursday that they have directly detected gravitational waves for the first time. The discovery verifies an unproven portion of Einstein's theory of general relativity. The scientists said we have detected gravitational waves. We did it. They detected gravitational tremors from a pair of spiraling black holes around one 3 billion lightyears away from Earth. The clash of black holes was so violent, its shockwaves rippled the ethereal fabric of space and time across a billion light years of distance. Einstein predicted gravitational waves, or ripples, in the fabric of space time, a century ago. But scientists worldwide have been searching for proof of gravitational waves for decades. More than 10 researchers in 15 countries were involved in the study that led to Thursday's announcement. The team was led by scientists at the California Institute of Technology and at the Massachusetts Institute of Technology. They used two gravitational wave observatories one in Hanford, Washington, and one in Livingston, Louisiana to measure the waves and crosscheck their results. These LIGO detectors measure how long it takes controlled laser light to travel between suspended mirrors. And on September 14, 2015, the researchers say they were able to detect the waves generated by the black holes as they crush together to merge into a single black hole. Gravitational waves from the merger of black holes or other massive objects produce a chirp, the scientists said. At Thursday's event, they played the signal that they recorded. In this latest study, the scientists say they shifted the sound a bit in frequency so that it's easier to hear. Did you hear the church? There's a rumbling noise, and then there's a shirt. Let me do that again. That's the church we've been looking for. The scientists say now that they know pairs of black holes do exist, they hope to use gravitational waves to probe some of the most mysterious objects in space and get more clues about the secrets of the universe. What's going to come now is we're going to be able to hear more of these things. And no doubt we'll hear things that we expected to hear, like binary black holes or perhaps binary neutron stars colliding. But we will also hear things that we never expected. And as we open a new window in astronomy, we may see things that we never, never saw before.

Summary types

In addition to the three model types, which are each best used with a specific type of input, each model also offers different summary types, which are used to tailor the desired output.

The summary types are

  1. gist - 3-10 word summary
  2. headline - about 20 word summary (1-2 sentences)
  3. paragraph - 30-100 word summary (3-5 sentences)
  4. bullets - Bulleted list of paragraph summaries (max of 6)
  5. bullets_verbose - Same as bullets, but there is no limit on the number of bullets

The gist and headline summary types are available for the Catchy model, while the headline, paragraph, bullets, and bullets_verbose summary types are available for the Informative and Conversational models.

The Catchy model generated the following summary above using the headline summary type:

Scientists Find Gravitational Waves for the First Time (New Study)

Using the gist summary type, we instead get this summary:

Scientists Find Gravitational Wave Signals

The Informative model generated the following summary above using the paragraph summary type:

A train heading from Queens into Manhattan was stalled underneath the East River around 8.30am Monday morning. Part of the train's contact shoe is thought to have touched the board instead of the rail, sparking the incident. More than 500 passengers were taken to Manhattan after spending roughly an hour and a half trapped. Service on the 7 line was suspended for almost two hours.

Similarly, using the headline summary type, we instead get this summary:

A train heading from Queens into Manhattan was stalled underneath the East River

Uses cases for summarization

Our customers have already built innovative, high-ROI features using summarization. Our new Summarization models will open the doors to new possibilities and creative solutions. Here are a few use cases for which summarization is well-suited:

Conversation Intelligence

  1. Call centers - summarization makes it easy to pass information up the chain of command, make reviews, and monitor calls.
  2. Meetings - easily summarize virtual or in-person meetings, interviews, etc. for people who couldn't be in attendance or for record-keeping.

Video and Podcasting

  1. Podcasting - summarization allows you to e.g. automatically generate episode descriptions at scale.
  2. Title Generation - add summarization to your workflow to automatically generate titles for video clips, making it easy to quickly release TikToks, YouTube Shorts, etc.
  3. Video summarization - summarize video files to quickly grok information at scale.

Media Monitoring

  1. News Aggregation - aggregate news intelligently by using summarization to generate titles and descriptions for news segments across many channels
  2. Social Monitoring - make it easier to digest and process huge quantities of data across social media platforms and more

These are just a few ways companies can incorporate AI-powered summarization into their processes to stay ahead of the competition.

Using the Summarization Models

You can check out this Colab notebook to see how to use the new Summarization models with Python, or move on to the next section to test them in a no-code fashion.

Using the new Summarization models is as simple as sending a POST request to the AssemblyAI API.

import requests
import time

API_TOKEN = "YOUR-TOKEN-HERE"
ENDPOINT = "https://api.assemblyai.com/v2/transcript"

json = {
    "audio_url": "https://bit.ly/3qDXLG8",
    "summarization": True,
    "summary_model": "informative",
    "summary_type": "bullets"
}
headers = {
    "authorization": API_TOKEN,
    "content-type": "application/json"
}

response = requests.post(ENDPOINT, json=json, headers=headers)

Once processing is completed, a simple GET request will fetch the results:

r = requests.get(f"{ENDPOINT}/{response.json()['id']}", headers=headers)
print(r.json()['summary'])

The corresponding summary for the audio file we used can be seen below:

- Neuroscientist Lisa Genova talks about the science of memory. Followed by a Q and A session with Ted science curator David Ballo. Recorded at a Ted Membership exclusive event in 2021.
- There's so many people who experience normal moments of forgetting. Part of the reason is perspective. Memory is very much influenced by context. Go back to the room you were in before you landed in this one and imagine the cues that were there.
- Most people don't think they have any influence over their brain health. What kinds of memory cues would be signs of abnormality? Or you should get further testing and checking. This becomes information that you can be in conversation with your doctor about.
- There are lots of reasons for having issues with retrieving memories. It can be sleep deprivation, it can be B twelve. Doesn't have to be Alzheimer's. It is something that you can hopefully address again, be involved in your brain health.
- Can diet help us to avoid memory loss? And can you kind of exercise your neurons into better memory through crossword puzzles or deeper relationships or anything like that? Exercise the diet. Sleep and stress and learning new things is also helpful.
- Adam Grant: The myth that you only use 10% of your brain is a fallacy. He says as you grow older, you don't lose the information of stuff you've learned. Grant: It's unlimited. There's no reason to think there's a limit to it.

Test our Summarization models today

The easiest way to test our new Summarization models is by going to the AssemblyAI Playground. The Playground is a no-code way to instantly see the results of our models on a local file or YouTube video.

Go to the AssemblyAI Playground and choose your audio source. We use the AssemblyAI Product Overview video from YouTube.

Next, select the Summarization model from the list of available models (as well as any other models you would like to test)

To change which Summarization model you are using and the type of summary to generate, click the three dots in the upper right hand corner of the Summarization model card.

After the audio has passed through our models, all results will be displayed in the browser. Below we can see the results from the Informative Summarization model on the above video with bullets summary type.