Tutorials

How to use audio data in LangChain with Python

Learn how to incorporate audio files into LangChain and build an LLM app on top of spoken data in this step-by-step tutorial.

How to use audio data in LangChain with Python

LangChain is a framework for developing applications powered by Large Language Models (LLMs). With LangChain, you can easily apply LLMs to your data and, for example, ask questions about the contents of your data. LLMs only work with textual data, so to process audio files with LLMs we first need to transcribe them into text.

Luckily, LangChain provides an AssemblyAI integration that lets you load audio data with just a few lines of code:

from langchain.document_loaders import AssemblyAIAudioTranscriptLoader

loader = AssemblyAIAudioTranscriptLoader("./my_file.mp3")

docs = loader.load()

Let's learn how to use this integration step-by-step. For this, we create a small demo application that lets you load audio data and apply an LLM that can answer questions about your spoken data.

Getting Started

Create a new virtual environment and activate it:

# Mac/Linux:
python3 -m venv venv
. venv/bin/activate

# Windows:
python -m venv venv
.\venv\Scripts\activate.bat

Install LangChain and the AssemblyAI Python package:

pip install langchain
pip install assemblyai

Set your AssemblyAI API key as an environment variable named ASSEMBLYAI_API_KEY. You can get a free API key here.

# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>

# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>

Use the AssemblyAIAudioTranscriptLoader

To load and transcribe audio data into documents, import the AssemblyAIAudioTranscriptLoader. It needs at least the file_path argument with an audio file specified as an URL or a local file path. You can read more about the integration in the official LangChain docs.

from langchain.document_loaders import AssemblyAIAudioTranscriptLoader

audio_file = "https://storage.googleapis.com/aai-docs-samples/sports_injuries.mp3"
# or a local file path: audio_file = "./sports_injuries.mp3"

loader = AssemblyAIAudioTranscriptLoader(file_path=audio_file)

docs = loader.load()

What's going on behind the scenes?

This document loader transcribes the given audio file and loads the transcribed text into LangChain documents. If a local file is given, it also uploads the file first.

Note: Calling loader.load() blocks until the transcription is finished.

After loading the data, the transcribed text is stored in the page_content attribute:

print(docs[0].page_content)
# Runner's knee. Runner's knee is a condition ...

The metadata contains the full JSON response of our API with more meta information:

print(docs[0].metadata)
# {'language_code': <LanguageCode.en_us: 'en_us'>,
#  'punctuate': True,
#  'format_text': True,
#   ...
# }

Tip: The default configuration of the document loader returns a list with only one document, that's why here we access the first document in the list with docs[0]. But you can use a different TranscriptFormat that splits text for example by sentences or paragraphs, and returns multiple documents. You can read more about the TranscriptFormat options here.

from langchain.document_loaders.assemblyai import TranscriptFormat

loader = AssemblyAIAudioTranscriptLoader(
    file_path="./your_file.mp3",
    transcript_format=TranscriptFormat.SENTENCES,
)

docs = loader.load()  # Now it returns a list with multiple documents

Apply a Question Answering Chain

Now that you have loaded the transcribed text into LangChain documents, you can easily ask questions about the spoken data. For example, you can apply a model from OpenAI with a QA chain.

For this, you also need to install the OpenAI Python package and set your OpenAI API key as environment variable:

pip install openai

# Mac/Linux:
export OPENAI_API_KEY=<YOUR_OPENAI_KEY>

# Windows:
set OPENAI_API_KEY=<YOUR_OPENAI_KEY>

Now, you can apply the load_qa_chain and pass in the retrieved documents from the first step as input_documents argument:

from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

llm = OpenAI()

qa_chain = load_qa_chain(llm, chain_type="stuff")

answer = qa_chain.run(input_documents=docs,
                      question="What is a runner's knee?")

print(answer)
# Runner's knee is a condition characterized by ...

Conclusion

This tutorial explained how to use the AssemblyAI integration that was added to the LangChain Python framework in version 0.0.272. You learned how to transcribe audio files and load the transcribed text into LangChain documents, and how to create a Q&A chain to ask questions about your spoken data.

Below is the complete code:

from langchain.document_loaders import AssemblyAIAudioTranscriptLoader
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

file_path = "https://storage.googleapis.com/aai-docs-samples/sports_injuries.mp3"

loader = AssemblyAIAudioTranscriptLoader(file_path)
docs = loader.load()

llm = OpenAI()
qa_chain = load_qa_chain(llm, chain_type="stuff")

answer = qa_chain.run(input_documents=docs,
                      question="What is a runner's knee?")
print(answer)

If you enjoyed this article, feel free to check out some others on our blog, like

Alternatively, check out our YouTube channel for learning resources on AI, like our Machine Learning from Scratch series.