Announcing the AssemblyAI integration for Semantic Kernel .NET

You can now integrate spoken audio data into Semantic Kernel .NET applications using the new AssemblyAI integration.

AssemblyAI and Semantic Kernel logo

Semantic Kernel is an SDK for multiple programming languages that integrates Large Language Models (LLMs) and other AI models. However, LLMs only operate on textual data and don’t understand what is said in audio files. With the AssemblyAI integration for Semantic Kernel, you can use AssemblyAI's transcription models using the TranscribePlugin to transcribe your audio and video files.

After you register the plugin, you can invoke the native Transcribe function either directly or from within a semantic function.

Invoke the Transcribe directly:

var result = await kernel.InvokeAsync(
    new KernelArguments
        ["INPUT"] = ""

The output prints the transcript of the audio file.

Invoke Transcribe from a semantic function:

string prompt = """
                Here is a transcript:
                {{TranscriptPlugin.Transcribe ""}}
                Summarize the transcript.
var result = await kernel.InvokePromptAsync(prompt);

The output prints a summary of the transcript, summarized by an LLM.


When you run a semantic function, the prompt is compiled first, substituting variables and executing function calls. The result of the variables and function calls is embedded into the prompt and then sent over to the LLM. In the example above, if your transcript is very long, you will exceed the token limit of the LLM. Here's a tutorial that stores the transcript in a vector database and uses the RAG pattern to avoid exceeding the token limit.

You can also use the plugin with Semantic Kernel planners as shown in this sample.

The plugin can also transcribe files stored on your local machine. Check out the README for more details on how to use the AssemblyAI integration for Semantic Kernel.

Next steps

Do you want to see the plugin in action? Follow this step-by-step guide to build a Semantic Kernel application that transcribes .NET Rocks! podcast shows, stores the podcast into a vector database, and generates answers to your questions using OpenAI GPT models.