June 23, 2026

How to use AI to build powerful market research tools

Learn how AI models can enhance market research tools and customer research platforms.

Kelsey Foster

Growth

Product Management

Market Research

Reviewed by

Table of contents

[Visible on live site]

A single hour-long customer interview holds more usable signal than a hundred multiple-choice survey responses—and almost nobody has time to watch it back. That's the problem market research platforms exist to solve, and it's why the good ones stopped treating voice and video as dead weight to store and started treating it as the richest data they have.

The catch was always the same: qualitative audio is expensive to analyze. Someone has to listen, transcribe, tag the themes, redact the names, and pull the quotes. Multiply that across thousands of respondents and the insight arrives three weeks after the decision was made.

Voice AI collapses that timeline. So let's get into how to build it—the transcription layer, the analysis layer, and the search-and-tag layer that turns a pile of recordings into something your customers can actually query.

What a market research platform needs from Voice AI

A market research platform is basically a searchable command center for customer feedback. It pulls in video, audio, and text from surveys, interviews, and open-ended responses, then runs analysis on top to surface patterns researchers can turn into decisions.

The teams doing this well—qualitative analysis platforms processing open-ended survey video, healthcare research tools comparing how patients describe a drug against the brand's own messaging—all hit the same wall. The wall is voice. Most of the genuinely useful feedback shows up as someone talking, not someone ticking a box, and you can't cluster, tag, or search a recording until it's text with structure.

There are three jobs to do, and they build on each other:

Transcribe asynchronous and live voice and video accurately, so review stops being a manual slog.
Generate themes, highlights, and analysis, so researchers see the signal without scrubbing the timeline.
Categorize, tag, and search every response, so insights are findable months later.

Here's the part that's changed since this was hard: the speech-to-text models are now accurate enough, and cheap enough, that the bottleneck moved from "can we transcribe this" to "what do we build on top."

1. Accurate transcripts from voice and video surveys

Everything downstream depends on the transcript being good. Garbage transcription means garbage themes, garbage tags, garbage search.

The current Universal models handle this. For async work—recorded interviews, uploaded survey clips, the back catalog you want to reprocess—you pass the models you want explicitly: speech_models=["universal-3-pro","universal-2"]. Universal-3.5 Pro covers the high-accuracy path across core languages; Universal-2 reaches 99 languages when your respondents are global. The older singular speech_model parameter is deprecated, so use the array.

Accuracy is only half of it, though. Readability is what makes a transcript usable. A raw, unformatted transcript looks like this:

But how did you guys first meet and how do you guys know each other? I actually met her not too long ago. I 
met her, I think last year in December, during pre season, we were both practicing at Carson a lot. And then 
we kind of met through other players.

No punctuation pattern, no speaker structure, exhausting to skim. Now turn on automatic casing, punctuation, and speaker diarization with speaker_labels: true, and the same exchange reads like a conversation:

Speaker A: But how did you guys first meet, and how do you know each other?‍S
peaker B: I actually met her not too long ago—last year in December, during preseason. We were both 
practicing at Carson a lot, and we kind of met through other players.

That's the difference between a researcher reading a transcript and a researcher avoiding it. Diarization matters even more for focus groups and multi-person interviews, where "who said the thing about price" is the whole question. If you want the mechanics, here's a solid breakdown of how speaker diarization works.

Then there's the privacy problem, which in research is not optional. Respondents name their doctor, read out an address, mention a medication. If your customers publish findings internally or externally, that PII has to go. PII redaction detects and replaces it automatically—each redacted item swapped for a # so the transcript stays readable without exposing anyone.

PII type	Example
Person name	"My nurse, Dr. Alvarez, said..."
Phone number	"Call me at 555-0148"
Location / address	"I live on Maple Street in Austin"
Medical condition	"after my diabetes diagnosis"
Financial / card data	"my card ending in 4242"

That's a representative slice, not the full list. The point is you can strip sensitive data before a transcript ever reaches storage or an LLM—which matters a lot if your customers handle health data. On that note: AssemblyAI is a business associate under HIPAA and offers a Business Associate Addendum (BAA) for customers processing PHI, so a BAA is available when your research touches protected health information.

Transcribe your first interview in minutes

Grab an API key and run async or real-time transcription with speaker labels, redaction, and summaries built in. Free to start, no sales call needed.

2. Generate highlights and analysis

Accurate transcripts are step one. The product your customers actually pay for is the layer that reads those transcripts so a human doesn't have to.

A few speech understanding features work together here. Summarization produces a tight overview of a long response—so a researcher reviews a 40-minute interview in the time it takes to read a paragraph. Auto chapters break a session into navigable segments, which is how you build a "jump to the part where they talked about onboarding" feature.

Entity detection identifies and classifies the specifics—a competitor's product name, a city, a job title—so your platform can compile every mention of "Salesforce" or "Chicago" across a thousand responses without anyone reading a thousand responses.

Entity type	Example
Organization	Salesforce, Nike
Location	San Francisco, Germany
Product	iPhone, a named SaaS tool
Occupation	nurse, product manager
Date / time	"last quarter," "March 2024"

Topic detection runs alongside it, labeling what each segment is about using the standardized IAB taxonomy. That's how you let researchers filter a corpus by subject without hand-coding every response.

Topic category	Example subtopics
Technology & Computing	Artificial intelligence, consumer electronics
Healthy Living	Nutrition, weight loss
Personal Finance	Insurance, personal investing
Shopping	Couponing, gifts and holidays
Style & Fashion	Beauty, women's fashion

And sentiment analysis labels segments positive, negative, or neutral—the fastest way to find what customers loved, what they hated, and where to dig. Track it across an entire study and you've got a sentiment map of a product launch.

But here's where it gets interesting. The real leap is sending all of that to an LLM. With the LLM Gateway, you POST https://llm-gateway.assemblyai.com/v1/chat/completions with a model like claude-sonnet-4-5 and ask the transcript a direct question: "What were the three most common complaints about checkout?" or "Summarize this respondent's feelings about pricing, and flag any objections." You get a structured, custom answer back—not a generic summary—in the same billing relationship as your transcription, instead of standing up a separate LLM vendor. The Gateway docs walk through the request shape and the models available. That's the difference between handing a researcher data and handing them an answer.

3. Categorize, tag, and search every response

The third job is what makes the whole thing usable six months later. A study isn't valuable if nobody can find the relevant clip when a new question comes up.

This is where the analysis layer pays off twice. Entity detection and topic detection don't just generate insight in the moment—they generate the smart tags your search runs on. Detect that a segment is about "pricing" and mentions a competitor, and you've automatically got two filters a researcher can pivot on later without tagging anything by hand.

Then summarization or an LLM call attaches a short, context-aware blurb to each tagged segment, so a search result isn't a raw timestamp—it's "respondent 47, frustrated about onboarding time, 12:30." That's the conversation intelligence pattern applied to research: structured output wired straight into a queryable interface.

Transcription alone gets you keyword search. The combination of transcription, understanding, and LLM-generated tags gets you a research database that answers questions in plain language—which is the product researchers have actually wanted all along.

The build is smaller than it used to be

The reason this is worth doing now and wasn't a couple of years ago is that the hard parts are API calls. Marvin, a qualitative data analysis platform, wired these models into its product and its customers now spend 60% less time analyzing research data. That's not a tooling upgrade—it's a different job. The researcher stops being a transcription clerk and goes back to being a researcher.

That's the move. Don't build a transcription pipeline; build the thing that reads the transcripts for your customers. The models are accurate, the redaction is handled, and the LLM layer means you ship "ask your research a question" instead of "search your transcripts for a keyword." The platforms that win this category will be the ones that treated voice as their best data source instead of their heaviest storage cost.

See it run on a real interview

Upload an audio or video clip and watch transcription, speaker labels, entities, topics, sentiment, and summaries come back in the browser. No code required.

Try playground

Frequently asked questions

What are the best AI market research tools?

The best AI market research tools combine accurate transcription with analysis features like summarization, entity detection, topic detection, and sentiment analysis. Rather than buying a single closed tool, many platforms build their own on top of a Voice AI API like AssemblyAI, which provides the Universal-3 family of speech-to-text models plus speech understanding and an LLM Gateway for custom analysis. That approach lets a research platform tailor the analysis layer to its own data model instead of fitting into someone else's.

How can AI analyze voice and video survey responses?

AI analyzes voice and video survey responses by first transcribing the audio into accurate, formatted text, then running understanding models over that text. From there you can generate summaries, detect entities and topics, score sentiment, and send the transcript to an LLM to answer specific research questions. This turns hours of recordings into structured, searchable insight without manual review.

How do I transcribe and tag qualitative research interviews?

Transcribe interviews by sending the audio to a speech-to-text API with speaker diarization enabled (speaker_labels: true) so each participant is labeled. To tag them, run entity detection and topic detection on the resulting transcript—those outputs become the smart tags your search and filtering run on. You can then attach an LLM-generated summary to each tagged segment so results are readable, not just timestamps.

Can AI detect themes and topics across customer interviews?

Yes. Topic detection labels what each segment of a transcript is about using the standardized IAB taxonomy, and entity detection surfaces recurring specifics like companies, products, and locations. Run these across an entire set of interviews and you can cluster recurring themes without hand-coding every response. Adding an LLM call on top lets you ask higher-level questions like "what are the three most common complaints across all respondents."

How do I keep PII safe in research transcripts?

Use PII redaction to automatically detect and remove personally identifiable information—names, phone numbers, addresses, medical conditions, financial data—replacing each item with a placeholder so the transcript stays readable. Running redaction before transcripts reach storage or an LLM keeps sensitive respondent data out of downstream systems. For research involving protected health information, AssemblyAI is a business associate under HIPAA and offers a Business Associate Addendum (BAA) for customers processing PHI.