According to Forrester, 84% of marketing teams cite phone calls as having higher conversion rates than any other form of engagement. But the manual tasks associated with mass phone calls–note taking, logging CRM data, QA reviews–can be labor intensive.
Call Tracking Solutions and platforms are looking to ease this process for marketers and sales teams with suites of call tracking automation tools. And product teams at these call tracking solutions are turning to AI to help create them.
Significant advances in Deep Learning and Machine Learning have pushed AI model accuracy and utility further than ever before. The best AI models today now power some of the most advanced technology on the market, including self-driving cars, automated fraud detection, and personalized recommendation systems. The same principles behind these models are also powering highly accurate speech recognition and audio analysis APIs.
For product teams building AI-powered features for call tracking, speech recognition technology is the logical starting point. By automatically transcribing each phone call that takes place–whether in real-time or asynchronously–call tracking solutions can offer an immediate and significant value to their customers.
But call transcription should be thought of as a starting point, not an end goal. Product teams can significantly increase their ROI and competitiveness by building intelligent tools on top of this call data. These tools can analyze call sentiment, identify common topics, auto-summarize key sections of phone conversations, and automate QA tasks.
This article will briefly review what defines a top call tracking solution. Then, it will examine some of the top call tracking companies that have already built call transcription and Audio Intelligence tools for their platforms–and achieved impressive results. Finally, it will peel back the mechanics behind these high ROI AI models and explore best practices for building smart call tracking tools.
What are Call Tracking Solutions?
Call tracking solutions offer lead tracking, lead management, and analytics and insights for companies with large volumes of phone calls.
The solutions use Dynamic Number Insertion (DNI) to track the online activity of leads across emails, social media posts, PPC ads, and more. Then, when the lead decides to engage the company with a phone call, the call tracking solution can identify what prior actions directly led to the phone call in order to better inform the conversation or future campaigns.
Many solutions also offer form tracking, lead valuing, call flows, lead searching/filtering, lead qualifications, custom reporting, and more.
Finally, the solutions aggregate the collected data and offer targeted, highly useful analytics for end users.
Speech-to-Text and Audio Intelligence for Call Tracking Solutions
Thanks to recent advances in Machine Learning and Deep Learning research, Automatic Speech Recognition, or ASR, APIs today are more accessible, affordable, and accurate than they have ever been. The best Speech-to-Text APIs can transcribe real-time and asynchronous audio and video streams at near-human level accuracy. This makes highly accurate call transcription a natural starting point for most product teams at call tracking solutions.
Product teams looking to incorporate transcription into their call tracking solutions should also consider transcript readability in addition to accuracy.
What is transcript readability? In the past, traditional speech recognition models could only output a transcript at about 70% accuracy. The models would also omit basic text features such as punctuation, capitalization, paragraph structure, and speaker labels. This made the transcripts only marginally useful and extremely difficult to read, for example:
When i'm gone for a while but hes always supportive so that always
takes a lot of stress off and lets me play and its a lot easier sure
wow well back to your college and pro career i know you are a usc
player and im sure that was an amazing team experience but a lot of
college players dont go on to go pro even though theyre incredible
players and the college level is very high its a shame that there isnt
much more interest in college tennis but how do you talk about mindset
shift from choosing tennis as a career versus like business or coding
or something you know and then making that decision from usc to just go
pro one of my goals when I was little was to always play professional i
think maybe some people just want to go to college or get a scholarship
and then end there but i knew i always wanted to continue my tennis
As a solution to this problem, the top Speech-to-Text APIs today incorporate Automatic Casing and Punctuation Models and Paragraph Detection models into their transcription APIs. With these models, transcription texts are automatically formatted with proper sentence structure, capitalization, punctuation, and paragraph structure, making them much easier to read. In addition, some APIs also offer Speaker Diarization (Speaker Labeling) models that automatically identify and label each speaker that talks during the audio and video stream, such as a phone call recording.
Compare the following transcript to the one above:
<Speaker A> When I'm gone for a while, but he's always supportive, so
that always takes a lot of stress off and lets me play, and it's a lot
easier.
<Speaker B> Sure. Wow. Well, back to your college and pro career. I
know you are a USC player and I'm sure that was an amazing team
experience but a lot of college players don't go on to go pro, even
though they're incredible players and the college level is very high.
It's a shame that there isn't much more interest in college tennis. But
how do you talk about mindset shift from choosing tennis as a career
versus, like, business or coding or something, you know, and then
making that decision from USC to just go pro?
<Speaker A> One of my goals when I was little was to always play
professional. I think maybe some people just want to go to college or
get a scholarship and then end there, but I knew I always wanted to
continue my tennis.
It’s much more readable, right?
Once the solution has accurate, readable transcription, product teams can also build Audio Intelligence tools on top of this call transcription data. Built using advanced AI models, these Audio Intelligence APIs offer powerful textual analysis, including Text Summarization, Content Moderation, Sentiment Analysis, Topic Detection, Entity Detection, PII Redaction, and more.
Then, product teams can integrate these advanced AI models into their platforms to build high utility, high ROI call tracking solutions on top of the call transcription data.
Now, let’s dive deeper into how industry-leading call tracking solutions are successfully incorporating Speech-to-Text transcription and Audio Intelligence into their platforms today.
Use Case 1: Amplifying Conversational Intelligence
CallRail is a SaaS solution that offers call tracking, form tracking, Conversation Intelligence, and other marketing analytics products. In order to boost competitiveness, CallRail was looking to incorporate AI models that would help it build out its product roadmap and scale its offering.
The call tracking solution started by integrating highly accurate call transcription into its platform. Then, its product team identified two key areas, to start with, to expand into Audio Intelligence:
- Detecting Important Words and Phrases
- Text Summarization
“Being able to know the contents of that conversation at a glance can allow them to truly unlock the knowledge of their business and how their customers view them,” explains Michael Cantrell, former Director of Product at CallRail.
Detecting Important Words and Phrases
In order to make each call transcription more digestible at a glance, CallRail needed an API that could Detect Important Words and Phrases. The Detect Important Words and Phrases API automatically detects key moments in the transcription text, helping CallRail surface and define important content for its end users. This includes commonly asked questions, frequently used keywords, or customer requests.
Text Summarization
Text Summarization APIs automatically generate key highlights and summaries from phone call conversations. Some even provide time-stamped summaries to make matching the text summary to the appropriate point in the audio or video stream easier for end users. For example, AssemblyAI’s Auto Chapters Summarization API segments phone calls into logical, time-stamped chapters when the conversation changes topic. Then, the API outputs a summary for each chapter, similar to what YouTube displays beneath its videos.
Text Summarization APIs make transcripts easier to process, speeding up QA processes. They can also help automate the CRM data transfer process.
With speech transcription and Audio Intelligence, CallRail has seen call transcription accuracy improve by 23% and doubled the number of customers using its platform.
Use Case 2: Complying with Regulations and Protecting Callers
WhatConverts is a lead tracking solution for marketing agencies and clients. In order to facilitate easier, smarter call tracking for its customers, WhatConverts was looking to integrate automatic call transcription into its platform. The company also needed to apply PII Redaction to each transcript in order to meet the compliance and regulatory requirements of its customers.
Accurate Call Transcription
Extremely accurate call transcription was the most important first step for WhatConverts.
“A quick glance at the transcript can tell you if the lead is quotable and should be passed on to sales, or if it’s junk,” explains Mac Mischke at WhatConverts. “That’s a huge piece of information that you need to know right away whenever a new lead comes in.”
In addition, each call transcript lets its customers quickly assign value to new leads, review best practices, and provide areas for improvement for its customer service and sales team.
To make the transcripts even easier to process at a glance, WhatConverts displays each conversation as message bubbles between callers and recipients. With the message bubbles, users can more quickly pinpoint key moments in the conversation.
If the user wants to listen to a specific moment in the conversation, they simply need to click the text in the transcript to be automatically taken to that point in the audio file.
PII Redaction
WhatConverts also works with many end users who have security and compliance needs, so the company wanted to be able to detect and redact PII, or Personally Identifiable Information, from each transcription.
PII Redaction APIs automatically detect and remove sensitive or personal content in a transcription text. Then, they replace the sensitive content with a series of #
that corresponds to the number of retracted digits. PII that can be redacted include social security numbers, credit card numbers, driver’s license numbers, home addresses, phone numbers, and more.
Here’s an example of what can most commonly be redacted automatically with a PII Redaction API:
For call tracking solutions, PII Redaction is an important addition to call transcription in order to meet privacy compliance requirements, laws, and/or regulations.
With accurate, automatic call transcription and PII Redaction, WhatConverts improved its accuracy by 10% and is able to offer an industry-leading service to its customers.
Use Case 3: Identifying Key Insights from Conversations
In addition to call transcription, detecting important words and phrases, Text Summarization, and PII Redaction, product teams at call tracking solutions need to build tools that can intelligently identify key areas of conversations.
Three main Audio Intelligence tools can help these teams meet this need: Sentiment Analysis, Topic Detection, and Entity Detection.
Sentiment Analysis APIs automatically label speech segments in a transcription text as positive
, negative
, or neutral
. For example, a Sentiment Analysis model would label the statement I love this product
as positive
or I’m getting frustrated by the support
as negative
.
Why is this important? Sentiment Analysis can track how agent and customer sentiment is trending throughout each section of the conversation. It can even be combined with other Audio Intelligence APIs, like Text Summarization, to tie a sentiment to each time-stamped summary. Then, managers can use this data to quickly flag areas of conversation for further review, identify key buy indicators, or note potential churn risks to follow up on additional conversations.
Sentiment Analysis APIs can also be used to coach sales representatives during onboarding, training, or review processes. For example, customer sentiment could be identified as tracking negative
at the beginning of a call but as tracking positive
by the end. By reviewing the sales rep’s talk track, managers could then point to key turns of phrases or approaches that other agents could incorporate into their conversations as well.
Here’s an example of how Aloware, a contact center software, used Sentiment Analysis to label sentiments as conversations occurred between agents and contacts:
Topic Detection and Entity Detection APIs can be useful here as well. Entity Detection, or Named Entity Recognition, APIs identify and classify important information in the transcription text. For example, doctor
is an entity that is classified as an occupation
. Topic Detection APIs identify and label a broader range of topics in the transcription text, such as baseball
or women’s fashion
.
With Entity Detection and Topic Detection, product teams can build tools to identify commonly recurring entities and topics and compile them for further analysis. These entities and topics can also be tied to Sentiment Analysis to determine customer feelings and opinions towards products, campaigns, or even agents.
Other common entities and topics that can be detected and classified with an Entity Detection or Topic Detection API include:
Competitive Call Tracking Solutions
To offer competitive, intelligent services, call tracking solutions must integrate powerful AI models into their platforms. This includes accurate real-time and asynchronous call transcription and Audio Intelligence APIs such as Important Words and Phrases Detection, Text Summarization, PII Redaction, Sentiment Analysis, Entity Detection, and Topic Detection.
Look for a Deep Learning company that can offer industry-best Speech-to-Text transcription and Audio Intelligence via a single powerful API. This will save significant time during the build and integration process, so product teams can boost ROI and customer satisfaction faster.