Introducing the AssemblyAI integration for LangChain Go

LangChain is a framework for developing applications using Large Language Models (LLMs). LangChain provides common building blocks for building integrations with LLMs.

However, LLMs only operate on textual data and don't understand audio data. With our recent contribution to LangChain Go, you can now integrate AssemblyAI's industry-leading speech-to-text models using the new document loader.

💡

The AssemblyAI document loader is also available for both LangChain (Python) and LangChain.js (JavaScript).

The following example answers a question about an audio file. The example uses AssemblyAI to transcribe the audio and OpenAI to generate a response to the question.

package main

import (
	"context"
	"fmt"
	"os"

	"github.com/AssemblyAI/assemblyai-go-sdk"
	"github.com/tmc/langchaingo/chains"
	"github.com/tmc/langchaingo/documentloaders"
	"github.com/tmc/langchaingo/llms/openai"
)

func main() {
	apiKey := os.Getenv("ASSEMBLYAI_API_KEY")
	llm, _ := openai.New()

	chain := chains.LoadStuffQA(llm)

	loader := documentloaders.NewAssemblyAIAudioTranscript(
		apiKey,
		documentloaders.WithAudioURL("https://storage.googleapis.com/aai-docs-samples/sports_injuries.mp3"),
		documentloaders.WithTranscriptParams(&assemblyai.TranscriptOptionalParams{
			LanguageCode: "en_us",
		}),
	)

	ctx := context.Background()

	docs, _ := loader.Load(ctx)

	answer, _ := chains.Call(ctx, chain, map[string]any{
		"input_documents": docs,
		"question":        "What is a runner's knee?",
	})

	fmt.Println(answer["text"])
}

When you run the example, you'll get an output similar to the following:

Runner's knee is a condition characterized by pain behind or around the kneecap, caused by overuse, muscle imbalance, and inadequate stretching. Symptoms include pain under or around the kneecap and pain when walking.

Leverage LLMs for audio data using LeMUR

To learn more ways you can chain the audio transcript loader, see the LangChain Go documentation.

If you're not already using LangChain Go, or if you're applying LLMs primarily to audio data, we encourage you to try LeMUR, a framework for leveraging LLMs to understand speech.

Introducing the AssemblyAI integration for LangChain Go

Leverage LLMs for audio data using LeMUR

Popular posts

AI trends in 2024: Graph Neural Networks

AI for Universal Audio Understanding: Qwen-Audio Explained

Combining Speech Recognition and Diarization in one model

How DALL-E 2 Actually Works