Processing Audio Files with LLMs using LeMUR
In this guide, we'll show you how to use AssemblyAI's LeMUR (Leveraging Large Language Models to Understand Recognized Speech) framework to process audio files with an LLM. You can use LeMUR to ask questions and generate answers, summarize, and get feedback on one or more transcripts processed by AssemblyAI.
Developers and product teams can sign up and join a waitlist to access LeMUR on a rate-limited Early Access basis. Early Access allows users to leverage LeMUR now and provide us direct feedback on the API.
Question & Answer
Ask questions about what was covered in a single conversation or across multiple transcripts.
- 1
LeMUR works on AssemblyAI transcripts. If you have not already created a transcript to use with LeMUR, start with our guide to Transcribing an audio file. You'll need the transcript
id
, and that transcript will need to be in acompleted
state. - 2
Import the necessary libraries for making an HTTP request and set up your API token.
- 3
Next, define your LeMUR request parameters for Q&A.
- 4
Now, provide one or more transcript IDs, then construct and send the API request. Your LeMUR output will be contained in the
response
key of the JSON API output. If there is an error, anerror
key will be returned instead of theresponse
key: - 5
The Question & Answer feature returns a set of question-answer pairs:
{
"question": "Is this caller a qualified buyer?",
"answer": "No"
},
{
"question": "What is the caller's mood?",
"answer": "The caller seems enthusiastic."
}
Custom Summary
Generate use case-specific content based on external context, formatting guidelines, and accurate transcripts.
- 1
Import the necessary libraries for making an HTTP request and set up your API token.
- 2
Next, provide one or more transcript IDs and define your LeMUR request parameters for Custom Summary.
- 3
Now, construct and send the API request. Your LeMUR output will be contained in the
response
key of the JSON API output. If there is an error, anerror
key will be returned instead of theresponse
key: - 4
The Custom Summary endpoint returns the summary as a string:
{
"response": "The caller contacted a car dealership about purchasing a vehicle. The salesperson described some available options, and the caller expressed interest in a few different models. They discussed pricing and negotiated a deal on a sedan that met the caller's needs. The caller agreed to come into the dealership to finalize the paperwork and pick up the new car."
}
AI Coach
Garner feedback tailored to the transcript.
- 1
Import the necessary libraries for making an HTTP request and set up your API token.
- 2
Next, provide one or more transcript IDs and define your LeMUR request parameters for AI Coach.
- 3
Now, construct and send the API request. Your LeMUR output will be contained in the
response
key of the JSON API output. If there is an error, anerror
key will be returned instead of theresponse
key: - 4
The AI Coach feature returns the feedback on the transcript:
{
"response": "Your opening was overly casual and unprofessional for a sales call. Focus on the key points of your product or service and how it can benefit the customer, rather than personal anecdotes. Avoid making broad generalizations about groups of people and stick to the facts. Your explanation of the product was disorganized and confusing. Provide a clear overview of how it works and its key features and benefits. You failed to ask the customer questions to determine their needs and see if your product is a good fit. A sales call should be a dialogue, not a monologue. Close with a strong summary of why they should choose your product and a call to action, such as a follow up meeting. With a more professional approach and concise, compelling presentation, you have a better chance of making a sale."
}
Conclusion
Building Generative AI products centered around human speech is challenging because audio files present challenges for LLMs. LeMUR chains together prompts, connects multiple Large Language Models together, and overcomes the need for extensive set-up or implementing a vector database for long-term information storage. LeMUR makes it possible to process and get responses on multiple audio files at once with a single API request.
To learn more about LeMUR, refer to the AssemblyAI blog.
If you encounter any issues or have any questions, you can refer to our FAQ or reach out to our Support team.