Extract Quotes with Timestamps Using LLM Gateway + Semantic Search
This guide will demonstrate how to use AssemblyAI’s LLM Gateway framework to process an audio file and find the best quotes included in it through Semantic Search.
Quickstart
Getting Started
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard.
You’ll also need to install a few libraries that this code depends on:
- Numpy, a scientific computing library.
- Sciki-Learn, a library for predictive data analysis.
- Sentence-Transformers, a framework for state-of-the-art sentence and text embedding.
Step-by-Step Instructions
Then import all of these libraries and set our AssemblyAI API key, headers, and base URL.
Next, define functions to upload and transcribe files using AssemblyAI’s Async API, as well as request sentences.
Then define a function to process each transcript text with LLM Gateway.
Define a function to implement a sliding window, which allows us to group sentences together in different combinations to retain their semantic meaning and context while also enabling us to customize the length (and thus duration) of the quotes.
Execute all upload and transcription functions.
Now we can iterate over all of the sentences in our transcript and create embeddings for them to use as part of our Semantic Search later.
We’ll be relying on SentenceTransformer’s multi-qa-mpnet-base-dot-v1 model, which has been fine-tuned specifically for Semantic Search, and is their highest-performing model for this task.
By default, we’ll group 5 sentences together while having 2 of them overlap when the window moves. This should give us quotes around 30 seconds in length at most.
Now we can query LLM Gateway to provide the type of quotes we want. In this case, let’s prompt LLM Gateway to find the best 3 quotes out of a video that we transcribed.
Now we can take the embeddings from the transcript text, as well as the embeddings from LLM Gateway’s output, and use them in our k-nearest neighbors algorithm to determine their similarity. The most similar quotes to what LLM Gateway identified will be surfaced as our 3 best quotes, along with their timestamps and confidence scores.
We’ll be relying on cosine similarity rather than the default Euclidean distance metric since it takes into account both the magnitude and direction of our vectors.