Automatically summarize audio and video files at scale

Over the past few years, the amount of audio and video content available online has been increasing at an exponential rate. This has created both a challenge and an opportunity for product teams building with audio/video. On one hand, there’s a tremendous amount of information encoded into audio/video files that can be leveraged to build powerful features and products. On the other hand, it can be difficult for product teams to work with audio/video data in its default state.

Speech-to-text or automatic transcription models help turn audio data into a more pliable format: text.

But simply transcribing audio and video files isn’t enough. To effectively build AI-first features and platforms, product teams need to extract insights from their data.

One option is to add Audio Intelligence models that can intelligently process and analyze vast amounts of transcribed audio data and deliver valuable information and insights.

AI-powered Summarization models, for example, can achieve impressive results on conversational data (phone calls, podcasts, video meetings, etc.) to help developers and product teams build exciting products that automatically summarize audio and video content. These models are powered by state-of-the-art AI research, and can automatically generate accurate summaries for audio or video files sent to our API.

Here’s a quick example of how powerful AI Summarization models can be:

Inside Intercom: Josh Seiden

0:00

/1962.135438

Bullets

Josh Seiden and Brian Donohue discuss the topic of outcome versus output on Inside Intercom. Josh Seiden is a product consultant and author who has just released a book called Outcomes Over Output. Brian is product management director and he's looking forward to the chat.
The main premise of the book is that by defining outcomes precisely, it's possible to apply this idea of outcomes in our work. It's in contrast to a really broad and undefined definition of the word ”outcome”.
Paul, one of the design managers at Intercom, was struggling to differentiate between customer outcomes and business impact. In Lean Startup, teams focus on what they can change in their behavior to make their customers more satisfied. They focus on the business impact instead of on the customer outcomes. They have a hypothesis and they test their hypothesis with an experiment. They don't have to be 100% certain, but they need to have a hunch. There is a difference between problem-focused and outcome-focused approaches to building and prioritizing projects. For example, a company is working on improving the inbox search feature in their product. They hope it will improve user retention and improve the business impact of the change.
Product teams need to focus on the outcome of their work rather than on the business impact of their product. They need to be more aware of the customer experience and their relationship with their business.
As a business owner, you have to build a theory of how the business works. The more you know about your business as your business goes on, the more you can build a business model. The business model is reflected in roadmaps and prioritizations.
Josh's book is available on Amazon, in print, in ebook and in audiobook on Audible.com. Brian's advice for teams looking to change their way of working is to start small and to use retrospectives and improve your process as you try to implement this. Josh and Brian enjoyed their conversation.

Paragraph

Josh Seiden and Brian Donohue discuss the topic of outcome versus output on Inside Intercom. Josh Seiden is a product consultant and author who has just released a book called Outcomes Over Output. Brian is product management director and he's looking forward to the chat.

Headline

Josh Seiden and Brian Donohue discuss the topic of outcomes versus output on Inside Intercom.

Gist

Outcomes over output

Text Summarization models in action

AI technology continues to rapidly advance, making it easier to leverage AI across industries. For example, users can now generate high-quality, realistic images and art with a single text prompt. Transformers have taken the world by storm and opened up the opportunity for Large Language Models (LLMs) that can generate code and write articles.

AssemblyAI’s AI Summarization models, for example, are built on the same AI technology–Transformers–that are behind most of these advances. The Summarization models are also purpose-built to work well on conversational data (phone calls, zoom meetings, screen recordings, videos, etc.).

To demonstrate the power of these models, see the number of examples below:

A deep dive into how AssemblyAI’s Text Summarization models work

AssemblyAI’s Text Summarization models are fast, scalable, and continuously updated by an in-house team of AI experts to keep it state-of-the-art as new research emerges. These models are accessible through a single API call, making it easy for teams of all sizes to embed the models into their products.

For example, here is how you’d request a summary for an audio file using the AssemblyAI Python SDK:

First, you need to install the AssemblyAI Python SDK:

pip install -U assemblyai

And then enter:

import assemblyai as aai
aai.settings.api_key = "YOUR_API_KEY"

config = aai.TranscriptionConfig(
  summarization=True,
  summary_type=aai.SummarizationType.bullets
)

transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe("https://bit.ly/3yxKEIY")
print(transcript.summary)

As you’ll see above, developers have access to customizable summary types, which can be controlled via the summary_type parameter. This offers developers the flexibility and control they need to generate different types of summaries depending on their use case.

For a more detailed tutorial on how to use the AssemblyAI Python SDK, visit this tutorial.

Here is a description of the available summary types AssemblyAI offers:

Use cases for Summarization

AI Text Summarization models and AI Summarizers for audio and video help customers across a wide range of industries and use cases, including Conversational Intelligence, Video/Media Platforms, Podcasts, and Virtual Meeting Platforms.

Conversational Intelligence

AI Summarization can be a powerful addition to Conversation Intelligence tools and help deliver over 395% ROI for end users.

Top benefits of Text Summarization for Conversation Intelligence include:

Automating manual call review to speed up QA workflows
Monitoring all calls for key insights or potential areas of concern
Automating note-taking to increase representative and customer engagement
Enabling more efficient context sharing between teams
Identifying key trends over time

Video/Media Platforms

AI Text Summarization can help video and media platforms build tools that distill long educational courses, lectures, media broadcasts, and more into their most essential points for faster consumption.

Summarization tools can also facilitate easier collaboration between viewers and make it easier for additional Conversational Intelligence analysis tools to be applied to the summary.

Podcasts

When integrated into a podcasting platform, Text Summarization tools can give podcast listeners a quick summary of what the podcast is about before they listen, making the UI more intuitive for users.

AI summarization can also make podcast episodes more searchable for end users and enhance ad targeting for podcasting platforms by allowing them to serve ads that more closely mirror the listeners’ current interests.

Virtual Meeting Platforms

Virtual meeting platforms and AI voice assistants like Fireflies integrate Speech AI and AI Summarization to offer summaries of full meetings, make meeting recordings easier to consume, and readily identify key takeaways and post-call action items.

Some virtual meeting platforms also add shareable video clips that correspond to key moments in the summary.

These virtual meeting tools integrate into popular meeting platforms like Google Meet, Zoom, and Microsoft Teams for ease of use for the end-user.

Test Text Summarization models today

You can use the AssemblyAI CLI to quickly test Text Summarization models on any audio or video file (even YouTube videos) right from your terminal. For example:

If you want to start building applications with these Summarization models, you can see more complete code examples in the AssemblyAI API documentation.

What’s next

Text Summarization models seek to empower developers and product teams to build new features that use AI to automatically extract essential information for their customers at scale.

Summarization is an active area of research—even measuring summary quality is difficult given its inherent subjectivity. There are many promising research avenues in the field of Summarization, and the AssemblyAI AI research team is excited to explore these avenues to advance the state-of-the-art.

Additional Reading: