The abundance of spoken data generated from phone calls, virtual meetings, online videos, podcasts, and more can serve as a crucial input for product teams and developers looking to build competitive Generative AI workflows and applications.
Before this spoken data can be input into a Generative AI model, however, it must first be translated from speech into text. And the accuracy of this transcription is critical to the accuracy and utility of the output, whether it be a summary, question response, search request, or new text generation.
AssemblyAI’s Conformer-2 AI model for automatic speech recognition, for example, is trained on 1.1M hours of English audio data, achieving state-of-the-art transcription results and improvements on proper nouns, alphanumerics, and robustness to noise when compared to Conformer-1.
Leading companies across industries are already putting Conformer-2 into production, offering highly accurate transcription to their customers, as well as competitive Generative AI tools and features built on top of this transcription data.
Use Case: Virtual Meetings
Sembly AI is an AI team assistant that transcribes virtual meetings, takes meeting notes, and generates intelligent insights for professional meetings. Users simply sync Sembly with their Google calendar or Outlook in order for Sembly to automatically join calls. After the meeting, users will find a call recording, call summary, and call outline.
Users can also Semblian™ to ask questions about the meeting, generate an email response, craft personalized messages, highlight key takeaways, and more.
These downstream Generative AI tasks are hyper-dependent on the accuracy of each meeting transcript–one of the reasons Artem Koren, Sembly AI’s Chief Product Officer, is so excited by the release of AssemblyAI’s Conformer-2 model. Proper noun accuracy in transcripts is also critical for many of Sembly AI’s Generative AI tools to work optimally. Conformer-2 showed a 6.8% improvement on proper noun error rate.
“Sembly AI strives to provide the industry's best AI teammate experience in over 30 languages,” explains Koren. “We are excited to work with Assembly AI to enable high-quality ASR that powers the latest and most advanced AI features for our customers.”
Use Case: Conversation Intelligence
CallRail is a lead intelligence company using AI to power Conversation Intelligence tools for its customers. An early integrator of AI, CallRail’s product team remains invested in a continuous AI-first approach when planning which new features and tools to prioritize on its product roadmap.
CallRail, like Sembly AI, views state-of-the-art audio and video transcription as a foundational building block for the company’s other AI-powered features. These include automatically surfacing keywords across calls, qualifying leads, tagging team members, analyzing key portions of sales calls, and more.
CallRail’s customers also need videos and calls transcribed with lengthy alphanumeric sequences, such as credit card numbers, phone numbers, etc. at high accuracy/ Conformer-2 realizes a 30.7% relative reduction in the mean Character Error Rate (CER) than Conformer-1, making transcripts with long alphanumeric strings even more accurate for CallRail’s customers.
“Transcription accuracy is paramount for CallRail to deliver unparalleled insights and actionable data to our valued customers,” says Ryan Johnson, Chief Product Officer at CallRail. “Partnering with AssemblyAI has made it easy for us to deliver world-class voice intelligence powered by market-leading speech-to-text technology.”
Read more about CallRail and its ongoing partnership with AssemblyAI.
Use Case: Video Editing
Vidyo.AI is an AI-based content repurposing and video editing platform designed to help content creators transform long-form audio and video into short-form snippets for social media.
Vidyo.AI’s product team needed an AI stack for spoken data to help power the platform’s core features, like AI captions and subtitles and auto video chapters.
Vedant Maheshwari, CEO at Vidyo.AI explains how AssemblyAI’s Conformer-2 model will make Vidyo.AI even more powerful and competitive moving forward.
"We've been using AssemblyAI for our speech-to-text services and have had a phenomenal experience so far,” says Maheshwari. “The integration was simple and easy for developers to get started, the accuracy is better than any other tools in the market (and we have tried them all). Would highly recommend this!"
The fact that Conformer-2 is even more robust to noise–achieving 43% fewer errors on AssemblyAI’s noisy test dataset when compared to the next best provider–is also very beneficial for Vidyo’s customers who have to transcribe videos with large amounts of background noise.