How Super scaled real-time voice agents for real estate with AssemblyAI

Super, a vertical Voice AI platform for real estate, switched to AssemblyAI's Universal-3 Pro Streaming to improve turn detection and keyterm accuracy for high-volume phone agents, and cut STT costs by ~30% in the process.

30%

cost savings on real-time transcription

1 day

integration into voice agent pipeline

Vertical Voice AI for real estate, at scale

The Voice AI market has matured rapidly as businesses look to automate phone-based interactions without sacrificing the quality of human conversation. Super is a vertical Voice AI platform built for real estate, providing real-time phone agents that field thousands of calls daily across inbound and outbound use cases. Founded by co-founder Lindsay Liu (CEO) and Vika Kovalchuk Zamparelli (Chief Product Officer), Super engineers its multi-agent platform around the specialized requirements of real estate communications, balancing latency, accuracy, short utterance handling, multilingual detection, and industry-specific terminology in every conversation.

Real-time voice agents have zero margin for awkward conversation

For a vertical voice AI platform, transcription quality determines whether a phone conversation feels like magic or like a robocall. Super's engineering team had tested several speech-to-text providers and consistently encountered the same set of limitations affecting conversational quality and the company's ability to scale.

Turn detection limitations: Most speech-to-text models offered only basic minimum and maximum timing controls for conversation turns, making it difficult to create natural rhythm. In live phone calls, awkward pauses or inappropriate barge-ins erode caller trust within seconds.

Concurrency that gates growth: As a rapidly scaling startup serving high-volume real estate operators, Super needed STT infrastructure that could grow with usage without hitting concurrency limits or performance thresholds, especially since speech-to-text is already priced on a usage basis.

Keyterm prompting that didn't actually boost: Earlier providers offered keyterm prompting features that produced minimal real-world impact, forcing Super's engineering team to over-engineer workarounds for the property names, addresses, and contact details that come up constantly in real estate conversations.

Selecting a transcription partner for the most critical building block

Recognizing that speech-to-text functions as the first domino in every voice conversation, Super's leadership team approached provider selection as a strategic engineering decision rather than a commodity infrastructure choice. Any pipeline provider has to pass the company's voice QA evaluations, and core provider changes involve the leadership team for technical and business considerations.

Rather than building proprietary speech-to-text capabilities, Super focuses its engineering resources on its multi-agent voice platform and orchestration layer, partnering with best-in-class providers for foundational components. The evaluation criteria for those providers center on the unique demands of real estate voice conversations: fast and accurate transcription with strong handling of short utterances, reliable multilingual detection, and effective keyterm prompting for industry vocabulary.

Two factors drove the initial decision to implement AssemblyAI, both areas where competing providers had fallen short:

Smarter turn detection that could enable more natural conversational flow, beyond the basic minimum and maximum timing controls available elsewhere
Concurrency capacity that wouldn't gate Super's growth as call volume scaled, particularly important given that speech-to-text is already priced on usage

"We require a leading edge speech-to-text provider that can meet our specialized needs: fast, accurate, targeted, and multilingual," says Liu.

Super selected AssemblyAI's Universal-3 Pro Streaming model, deployed via LiveKit, to power the transcription layer for its phone agents. Universal-3 Pro Streaming is engineered for real-time voice applications, with improvements to turn detection, multilingual handling, and keyterm prompting that meaningfully boosts the terms developers prioritize, rather than functioning only as a nominal feature.

‍

From integration to voice agent pipeline in a single day

Super's engineering team executed a fast integration that immediately moved the project into iterative QA cycles tailored to real estate use cases. Universal-3 Pro Streaming dropped into Super's existing LiveKit pipeline in approximately one day, freeing the team to move straight into testing and evaluation. That work combines scaled evaluation through Super's voice QA platform with human QA targeted at common failure areas, including sound quality, background noise, short replies, specific keyterms, and language detection. Depending on the depth of refinement required, the process spans a few days to a couple of weeks.

Implementing a new STT provider required just one engineer plus QA support, reflecting the straightforward integration architecture. Throughout implementation, Super's team had a dedicated Slack channel with AssemblyAI's customer engineering team for questions, best practices, and debugging on real production scenarios.

"The Slack channel with customers is hugely impactful," says Liu, naming the responsiveness of AssemblyAI's customer engineering team as one of the most unexpected benefits of the partnership.

‍

Speech-to-text improvements compound across the voice pipeline

The Universal-3 Pro Streaming integration delivered measurable improvements that Super observed almost immediately in production, given the volume of calls running through the platform every day.

Approximately 30% cost savings on STT usage: Universal-3 Pro Streaming proved cost-effective compared to alternatives when factoring in hourly cost and the scaled usage included, delivering meaningful savings on a critical infrastructure line item.

Engineering team refocused on product development: With fewer widespread STT issues to firefight, Super's engineering team has reallocated capacity toward new feature development rather than reactive maintenance on the transcription layer.

Production impact visible within days: Fielding thousands of calls daily across its customer base, Super was able to observe improvements in conversation quality and customer feedback as soon as Universal-3 Pro Streaming went live in production.

Improved conversational clarity for callers and customers: End callers experience clearer conversations with more natural turn-taking, while Super's customers receive higher-quality transcription outputs that lead to better resolutions for their own end customers.

‍

Scaling specialized Voice AI with a long-term transcription partner

The improved STT foundation positions Super to expand both its volume and the sophistication of its voice agent platform across the real estate vertical.

Expanding usage and use cases: Super continues to scale its usage of AssemblyAI's transcription as the platform grows across new use cases and customer segments, with confidence that concurrency won't become a ceiling.

Multi-agent platform expansion: Super's multi-agent architecture allows for hyper-specialized agent configurations, and the availability of mid-stream parameter updates supports fine-tuning for specific agent use cases without sacrificing real-time performance.

Roadmap alignment on what matters most: Super's team is focused on continued accuracy improvements for email, phone, name, and address capture; enhanced keyterm accuracy on short utterances; and ongoing turn detection refinements, all areas where AssemblyAI's roadmap is actively developing.

A partnership orientation, not a vendor relationship: Super maintains an ongoing evaluation process for best-in-class providers across its voice pipeline, and AssemblyAI has demonstrated the engineering responsiveness and product trajectory that justify a long-term partnership.

Bottom line

Super has built its competitive position in vertical voice AI on a foundation of fast, accurate, and natural-feeling speech-to-text. With Universal-3 Pro Streaming powering its phone agents, the company has reduced STT-related issues, lowered transcription costs by approximately 30%, and freed its engineering team to focus on the multi-agent platform innovations that differentiate Super in the real estate market, all while improving the conversational quality that makes voice AI feel less like a robocall and more like a real conversation.

Start building what's next

Top Voice AI companies rely on AssemblyAI’s speech-to-text and speech understanding models to launch groundbreaking products fast and scale with ease.

Get started now

A partnership built on support and scale

JotPsych

JotPsych introduces an ambient AI scribe designed specifically for behavioral health, engineered around clinical workflows instead of technical plumbing. Fast, accurate, and purpose-built for mental-health providers.

Calabrio

Leading workforce and conversation intelligence provider leaps from legacy on-premise solution, boosts customer satisfaction by 80%, and accelerates global expansion.

Earmark

How a product management AI startup launched successfully with real-time meeting transcription at scale.

Unlock the value of voice data

Build what’s next on the platform powering thousands of the industry’s leading of Voice AI apps.

Try our API for free Contact sales