Skip to main content

Processing Audio Files with LLMs using LeMUR

In this guide, we'll show you how to use AssemblyAI's LeMUR (Leveraging Large Language Models to Understand Recognized Speech) framework to process audio files with an LLM. You can use LeMUR to ask questions and generate answers, summarize, and get feedback on one or more transcripts processed by AssemblyAI.

Developers and product teams can sign up and join a waitlist to access LeMUR on a rate-limited Early Access basis. Early Access allows users to leverage LeMUR now and provide us direct feedback on the API.

Get started

Before we begin, make sure you have an AssemblyAI account and an API token. You can sign up for an account and get your API token from your dashboard.

LeMUR features are currently only available to paid users, at two pricing tiers: LeMUR and LeMUR Basic. Refer to pricing for more detail.

Question & Answer

Ask questions about what was covered in a single conversation or across multiple transcripts.

  1. 1

    LeMUR works on AssemblyAI transcripts. If you have not already created a transcript to use with LeMUR, start with our guide to Transcribing an audio file. You'll need the transcript id, and that transcript will need to be in a completed state.

  2. 2

    Import the necessary libraries for making an HTTP request and set up your API token.

  3. 3

    Next, define your LeMUR request parameters for Q&A.

  4. 4

    Now, provide one or more transcript IDs, then construct and send the API request. Your LeMUR output will be contained in the response key of the JSON API output. If there is an error, an error key will be returned instead of the response key:

  5. 5

    The Question & Answer feature returns a set of question-answer pairs:

    {
    "question": "Is this caller a qualified buyer?",
    "answer": "No"
    },
    {
    "question": "What is the caller's mood?",
    "answer": "The caller seems enthusiastic."
    }

Custom Summary

Generate use case-specific content based on external context, formatting guidelines, and accurate transcripts.

  1. 1

    Import the necessary libraries for making an HTTP request and set up your API token.

  2. 2

    Next, provide one or more transcript IDs and define your LeMUR request parameters for Custom Summary.

  3. 3

    Now, construct and send the API request. Your LeMUR output will be contained in the response key of the JSON API output. If there is an error, an error key will be returned instead of the response key:

  4. 4

    The Custom Summary endpoint returns the summary as a string:

    {
    "response": "The caller contacted a car dealership about purchasing a vehicle. The salesperson described some available options, and the caller expressed interest in a few different models. They discussed pricing and negotiated a deal on a sedan that met the caller's needs. The caller agreed to come into the dealership to finalize the paperwork and pick up the new car."
    }

AI Coach

Garner feedback tailored to the transcript.

  1. 1

    Import the necessary libraries for making an HTTP request and set up your API token.

  2. 2

    Next, provide one or more transcript IDs and define your LeMUR request parameters for AI Coach.

  3. 3

    Now, construct and send the API request. Your LeMUR output will be contained in the response key of the JSON API output. If there is an error, an error key will be returned instead of the response key:

  4. 4

    The AI Coach feature returns the feedback on the transcript:

    {
    "response": "Your opening was overly casual and unprofessional for a sales call. Focus on the key points of your product or service and how it can benefit the customer, rather than personal anecdotes. Avoid making broad generalizations about groups of people and stick to the facts. Your explanation of the product was disorganized and confusing. Provide a clear overview of how it works and its key features and benefits. You failed to ask the customer questions to determine their needs and see if your product is a good fit. A sales call should be a dialogue, not a monologue. Close with a strong summary of why they should choose your product and a call to action, such as a follow up meeting. With a more professional approach and concise, compelling presentation, you have a better chance of making a sale."
    }

Conclusion

Building Generative AI products centered around human speech is challenging because audio files present challenges for LLMs. LeMUR chains together prompts, connects multiple Large Language Models together, and overcomes the need for extensive set-up or implementing a vector database for long-term information storage. LeMUR makes it possible to process and get responses on multiple audio files at once with a single API request.

To learn more about LeMUR, refer to the AssemblyAI blog.

If you encounter any issues or have any questions, you can refer to our FAQ or reach out to our Support team.