Available now for early access

Introducing LeMUR, our new framework for applying powerful LLMs to transcribed speech

With a single line of code, LeMUR can quickly process audio transcripts for up to 10 hours worth of audio content, which effectively translates into ~150k tokens, for tasks like summarization and question answer.

Large Language Models (LLMs) are changing what users expect in every industry. However, it is still difficult to build Generative AI products centered around human speech because audio files present challenges for LLMs.

One key challenge with applying LLMs to audio files today is that LLMs are limited by their context windows. Before an audio file can be sent into an LLM, it needs to be converted into text. The longer an audio file is when transcribed into text, the greater the engineering challenge it is to workaround LLM context window limits.

LeMUR, short for Leveraging Large Language Models to Understand Recognized Speech, is our new framework for applying powerful LLMs to transcribed speech to solve this issue. With a single line of code (via our Python SDK), LeMUR can quickly process audio transcripts for up to 10 hours worth of audio content which effectively translates into ~150K tokens. By contrast, off-the-shelf, common LLMs are only able to fit up to 8K or ~45 minutes worth of transcribed audio within their context window limits.

To solve the complexity of applying LLMs to transcribed audio files, LeMUR essentially wraps a pipeline of intelligent segmentation, a fast vector database, and reasoning steps like chain-of-thought prompting and self evaluation as illustrated below:

LeMUR architecture
Fig. 1 — LeMUR's architecture enables users to send long and/or multiple audio transcripts into a LLM with a single API call.
“LeMUR unlocks some amazing new possibilities that I never would have thought were possible just a few years ago. The ability to effortlessly extract valuable insights, such as identifying optimal actions, empowering agent scorecard and coaching, and discerning call outcomes like sales, appointments, or call purposes, feels truly magical.”

What LeMUR unlocks

Apply LLMs to multiple audio transcripts

LeMUR enables users to get responses from LLMs on multiple audio files at once and transcripts up to 10 hours in duration, which effectively translates to a context window of ~150K tokens.

without lemur
with lemur

Reliable & safe outputs

Because LeMUR includes safety measures and content filters, it will provide users with responses from an LLM that are less likely to generate harmful or biased language.

without lemur
with lemur

Inject context specific to your use case

LeMUR enables users to provide additional context at inference time that an LLM can use to provide personalized and more accurate results when generating outputs.

without lemur
with lemur

Modular, fast integration

LeMUR consistently returns structured data in the form of consumable JSON. Users can further customize the format of LeMUR’s output, to ensure responses are in the format their next piece of business logic expects (for example, boolean answers to questions). This eliminates the need for building custom code to handle the output of LLMs, making LeMUR just a few lines of code to practically bring LLM capabilities to users’ products.

Continuously state-of-the-art

New LLM technologies and models are continually being released. AssemblyAI pulls in the newest breakthroughs into LeMUR and all of our ASR models to ensure users can build with the latest AI technology.

What you can build with LeMUR today

To start, LeMUR is focused on a set of flexible endpoints that can be used for multiple use cases. We’re expanding the number of use cases rapidly and would like to hear yours.

Question and Answer

Try now in Playground
import assemblyai.sdk as aai

transcript = assemblyai.transcribe("3_hour_customer_call.mp3")
question = "In one sentence, what did the customer say was the best new feature?"

answer = transcript.lemur.question(question)
print(answer.text)
>> "The customer said that having an online coach available during study hours was the best feature of the learning product."
import assemblyai.sdk as aai

transcript = assemblyai.transcribe("poetry-superpowers.mp3")
summary = transcript.lemur.summarize("This is a popular TED talk.",
                                    answer_format="one bullet point")
print(summary.text)
>> """
• A man describes growing up with dyslexia and struggling in school, finding his passion for writing and storytelling through discovering hip hop music, slam poetry, and comic books. He honed his craft by observing the world around him and finding inspiration in everyday people and places. He eventually made it to Broadway as a playwright, achieving his childhood dream of becoming a "superhero" through the power of storytelling.
"""
import assemblyai.sdk as aai

audio_files = ["ultimate-django-course.mov", "django-quickstart.mp3",
               "python-django-beginners.mov", "starting-django.mp4"]

transcript = assemblyai.transcribe(audio_files)

coach_topic = "What can the instructors do better to explain URLs?"
feedback = transcript.lemur.ask_coach(coach_topic)
print(feedback.text)
>>"""To improve explaining URLs, the instructor could:
- Provide visual examples of URL structures and relationships.
- Verbalize the logic behind how URLs map to app structures and entity relationships.
- Highlight best practices for REST API URL design and how those apply to the concepts being explained.
"""