Ninety-two percent of customer interactions occur over the phone. Even if these meetings are recorded, how can companies effectively synthesize hours of conversations into meaningful soundbits?
In order to facilitate effective lead conversion and customer engagement, Sales, Marketing, Customer Success, and Human Resource teams must be able to (A) record, transcribe, and search across all voice conversations and (B) make sense of these conversations to create visibility, drive process and behavior changes, and deliver bottom-line impact.
Many virtual meeting and telephony companies are turning to Conversation Intelligence Platforms to solve these challenges.
This article first looks at what exactly Conversation Intelligence is before examining how AI Models help power these intelligent platforms.
Conversation Intelligence Platforms
Before we look more closely at ASR and NLU/NLP tools, let’s first discuss Conversation Intelligence and why it’s important.
What is Conversation Intelligence?
Conversation Intelligence refers to the combination of call transcription and the use of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) to infer valuable meaning from the conversation. This process could include deducing intent or sentiment, surfacing named entities, sorting conversations into shorter chapters or summaries, identifying common topics or themes, and more.
Why is Conversation Intelligence Important?
At its core, Conversation Intelligence compiles a large number of customer interactions and performs meaningful analysis on these interactions in a short amount of time–much faster and more deeply, and with less bias, than what could be done manually.
These insights can be used for input analysis and response generation, like for a customer-facing chatbot, to improve customer service, to better train customer service agents, facilitate smarter sales calls, and more. Ultimately, Conversation Intelligence Platforms generate high ROI through specific, actionable insights.
Now, let’s look at how two of these ASR and NLU/NLP technologies–Speech-to-Text and Audio Intelligence APIs–serve as the “brains” behind Conversation Intelligence.
Speech-to-Text and Audio Intelligence APIs for Conversation Intelligence
Highly accurate ASR platforms such as Speech-to-Text APIs automatically transcribe video and audio streams, like sales calls or virtual meetings, into a written transcription text. In the past, transcription texts were often free of punctuation and casing, paragraph structure, and speaker labels, turning what was supposed to be a helpful step into a laborious one instead.
Thankfully, today’s top Speech-to-Text APIs can automatically modify a transcription to include the elements listed above, making the text much easier to digest and analyze. For example, AssemblyAI’s Automatic Casing and Punctuation Models are trained on texts that include billions of words, resulting in industry-best transcription accuracy and increased utility.
Some Speech-to-Text APIs also go beyond this basic transcription, offering advanced NLU/NLP tools referred to as Audio Intelligence APIs.
Built using the latest AI, ML, DL, and NLU/NLP research, Audio Intelligence APIs let users quickly build high ROI features and applications on top of their audio data–helping move past line level transcription. These features could include detecting important entities in a text, identifying sentiments spoken by speakers, sorting a text automatically into chapters, and more.
Here are the three biggest impacts ASR and NLP/NLU tools like Audio Intelligence can have on Conversation Intelligence Platforms:
1. Automate Time-consuming Meeting Transcription Process
ASR and Audio Intelligence tools can automate the time-consuming process of accurately capturing customer meetings, filling out appropriate CRM data, and making meaningful connections across conversations.
The first step is to use a Speech-to-Text API with high accuracy and low Word Error Rate (WER) that has been trained on conversations from a wide variety of industries, dialects, and accents. This will ensure high accuracy regardless of speech patterns and technical jargon. If industry-specific or technical language is a barrier to accurate transcription, some Speech-to-Text APIs offer a Word Boost feature that lets you add custom vocabulary lists to increase this accuracy further.
Accurate transcription is only the first step, however. Next, you need to apply NLU/NLP tools on top of the transcription data to identify speakers, automate CRM data, identify important sections of the calls, etc.
There are a few main Audio Intelligence tools that help achieve this more sophisticated analysis. These include: Speaker Diarization, Auto Chapters, Topic Detection, and Sentiment Analysis.
Speaker Diarization applies speaker labels to a transcription text, helping answer the question - who spoke when? This lets companies automatically sort speech utterances by agents or customers to ease readability and analysis.
Auto Chapters, also referred to as Summarization, provides a “summary over time” for each transcription. It works by (A) segmenting the text into logical chapters, or where the conversational topic changes, and then by (B) automatically generating a summary for each of these chapters.
Auto Chapters helps make conversations easier to skim, navigate, or even identify common themes/topics quickly for further analysis.
Topic Detection identifies and labels topics in a transcription text, helping companies better understand context and identify patterns. This process can help companies identify trends such as topics that lead to questions, objections, positive statements, negative statements, and more.
Sentiment Analysis identifies positive
, negative
, or neutral
sentiments in a transcription. Sentiments are identified and sorted by each speech segment and/or speaker. For example, the speech segment:
This product has been working really well for me so far.
would be identified as a POSITIVE
sentiment.
Sentiment Analysis APIs can help identify changes in tone in a transcription–where did a call change from positive to negative? Or from negative to positive? Are more positive or negative sentiments associated with certain topics? With certain speakers? Then, this data can be used to determine patterns and identify the appropriate actions that need to be taken.
2. Make Voice Conversations Searchable and Indexable
In addition to automating transcription, Conversation Intelligence Platforms also need to help companies make these voice conversations both searchable and indexable. Have hours and hours of meeting conversations to review? Audio Intelligence can help companies review these calls in mere minutes by enabling search across action items and auto-highlights of key sections of the conversations.
As discussed in the previous section, Auto Chapters can be a powerful tool to help companies quickly make sense of a lengthy conversation. In addition, some Audio Intelligence APIs also offer the option to Detect Important Phrases and Words in a transcription text.
For example, the AssemblyAI API detected the following keywords and phrases:
synthetic happiness
natural happiness
In this transcription text:
We smirk because we believe that synthetic happiness is not of the same
quality as what we might call natural happiness. What are these terms?
Natural happiness is what we get when we get what we wanted. And
synthetic happiness is what we make when we don't get what we wanted.
And in our society...
Use this Audio Intelligence feature to quickly search for these common words/phrases and identify trends for further analysis.
Entity Detection is another Audio Intelligence tool that can help make transcriptions more easily searchable.
Entity Detection APIs (A) identify and (B) classify specified entities in a transcription. For example, San Diego
is an entity that would be classified as a location
. Common entities that can be detected include dates, languages, nationalities, number strings (like a phone number), personal names, company names, locations, addresses, and more. Then, companies can use this information to search for these entities and identify common trends or other actionable insights.
Several of the other Audio Intelligence APIs mentioned previously can help here as well–search by speaker using Speaker Diarization, search by topic using Topic Detection, or search by sentiments or conversational tone changes using Sentiment Analysis.
3. Gain Actionable Insights
Finally, companies need to use AI-powered Conversation Intelligence Platforms to gain actionable insights that directly increase customer engagement, drive process and behavior changes, and deliver faster ROI.
Audio Intelligence APIs don’t have to stand alone. In fact, when used together, the Audio Intelligence APIs discussed throughout this post help companies find valuable structure and patterns in the previously unstructured data. This structure provides important visibility into rep activity and customer and prospect engagement, helping keep teams in sync and generating data-backed goals and actions.
For example, Topic and Entity Detection, combined with Sentiment Analysis, can help companies track how customers are reacting to a particular product, pitch, or pricing change. Detecting Important Words and Phrases, combined with Topic Detection, can help companies identify common language being used about products or services. Entity Detection can also be used to surface when a prospect mentions a certain competitor, while Sentiment Analysis can inform opinions around this mention.
AI-powered Conversation Intelligence Platforms
Sales, Marketing, Customer Success, and Human Resource teams must be equipped with powerful tools to boost lead conversion and customer engagement in a competitive market.
Incorporating key AI, ML, and DL technologies, like Speech-to-Text transcription and Audio Intelligence, into Conversation Intelligence Platforms provides teams with the advanced analytics and strategic insights needed to accomplish these goals.