Blog

How to Build a Burner Phone with Voicemail in Python

Tutorials
Share on social icon.Share on social icon.Share on social icon.Share on social icon.

Overview

Do you get a lot of spam calls? With the rise of automated calling, more and more spam calls are being made. Let’s fight back, in this tutorial I’ll show you how to make a burner phone to record incoming calls with Twilio and transcribe them with AssemblyAI. Twilio is a communications API that provides support around communicating via phone capabilities. AssemblyAI is a fast, automatic speech to text API, ranked by Nordic API as the top public API of 2020. You can find the source code here.

Introduction

Today we're going to use Python to build a burner phone. What's a burner phone? It's a phone that you can use in place of your real one and then get rid of. You can use the free number that Twilio let’s you provision to follow along. In this tutorial, we’ll provision a number from Twilio, create a function using Python Flask that will pick up phone calls and record them, programmatically update our number's webhook endpoint, make some example phone calls, download our .mp3 files, and transcribe those .mp3 files with AssemblyAI. At the end we'll also look at a comparison between the accuracy of the transcripts using Twilio's built in transcription and the transcription from AssemblyAI's Speech-to-Text API.

Steps:

  1. Sign up for a Twilio Account
  2. Copy your credentials
  3. Provision a phone number from Twilio
  4. Create a flask endpoint to record voice
  5. Get ngrok and run it to expose your Flask endpoint to the web
  6. Update Twilio provisioned phone number with new webhook endpoint
  7. Make some calls
  8. Check for and download recordings
  9. Get an AssemblyAI API key
  10. Transcribe them via AssemblyAI
  11. Comparison between Twilio’s and AssemblyAI’s transcription

Sign up for a Twilio Account

Sign up for a Twilio account at twilio.com and get your account sid and auth token. I’ve circled where they should be on your console (and blocked out my account sid).

Where to find your Twilio credentials

Copy your credentials

Copy your credentials and either save them in your environment variables or save them in a configure.py file in the folder you’re working in.

acct_id = '<your account sid>'
twilio_auth_key = '<your twilio auth token>'


Provision a phone number from Twilio

Now let’s programmatically provision a phone number from Twilio via their REST API using Python. We’ll list 20 local numbers and then pick one of them.

from configure import acct_id, twilio_auth_key
from twilio.rest import Client
 
client = Client(acct_id, twilio_auth_key)
 
US_avail_phone_numbers = client.available_phone_numbers('US').fetch()
 
# lists 20 available local numbers with an area code of your choice
list_20 = US_avail_phone_numbers.local.list(area_code='<your area code>', limit=20)
for num in list_20:
   # only list the number if it has voice capability
   # they all should
   if num.capabilities['voice'] == True:
       print(num.phone_number)
_number = input("Which phone number do you want?")
 
# request your new number
new_number = client.incoming_phone_numbers.create(
   phone_number= _number)
 
print("You've activated your new number", new_number.phone_number)


When we’re done, our terminal output should look like:

Picking a Burner Phone Number

Create a flask endpoint to record voice

If you don’t already have Python Flask downloaded, you can install it with 

pip install flask


It is very important that you read this block and the next block carefully. We will create an endpoint using Python Flask. Our endpoint will record an incoming phone call and transcribe it via Twilio (later on we’ll look at a comparison).


from flask import Flask
from twilio.twiml.voice_response import VoiceResponse
 
app = Flask(__name__)
 
@app.route("/voice", methods=["GET", "POST"])
def voice():
   """Read a message aloud"""
   resp = VoiceResponse()
   resp.say("Please leave a message")
   resp.record(timeout=10, transcribe=True)
   resp.hangup()
   return(str(resp))
 
if __name__ == "__main__":
   app.run(debug=True)

Now we run this in our terminal and we should see an output like this:

Start your Flask Application


Get ngrok and run it to expose your flask endpoint to the web

After our flask app is up and running in our terminal, we also need to download and run ngrok. You can download ngrok here. After downloading ngrok and copying it into our working folder, we’ll open a second terminal and run 

./ngrok http 5000


Note that the “5000” can be replaced with whatever port you’re running your Python Flask application on. As you can see above, ours is running on port 5000. Ngrok should expose and return an endpoint we can hit. We’ll need to keep track of the https forwarding link, that is going to be our updated webhook URL for Twilio to hit when we call.

ngrok

Update Twilio provisioned phone number with new webhook endpoint

Alright, now that we have an app that will record phone calls, let’s update our webhook on Twilio via their Python REST API.

from twilio.rest import Client
from configure import twilio_auth_key, acct_id
 
client = Client(acct_id, twilio_auth_key)
 
phone_numbers = client.incoming_phone_numbers.list()
 
for record in phone_numbers:
   print(record.sid)
 
_sid = input("What phone number SID do you want to update?")
_url = input("What do you want to update the webhook URL to?")
 
updated_phone_number = client.incoming_phone_numbers(_sid).update(voice_url=_url)
 
print("Voice URL updated to", updated_phone_number.voice_url)


When we run this, we should get this output in the terminal. Notice that I added a “/voice” to the end of the URL provided by ngrok, that is because I defined “/voice” as the endpoint in our Flask application above.

Updating Phone Number Webhooks in Twilio

Make some calls

There’s no code to write here, but make some calls to your new burner phone number. If everything is set up correctly, you should hear a female voice say “Please leave a message”. For the purpose of this tutorial, I made 3 phone calls and left 3 messages.


The message I left are:

“This is a third and final recording that I’m going to use for testing transcription services. So yeah, I should have been a cowboy”

“This is a test recording for transcriptions. Sally sells seashells down by the sea shore”

and

“A B C D E F G This is a test call for recording transcriptions with Twilio and AssemblyAI”



Check for and download recordings

Now let’s check out our recordings on Twilio and download them.

from configure import acct_id, twilio_auth_key
from twilio.rest import Client
import requests
 
twilio_url = "https://api.twilio.com"
client = Client(acct_id, twilio_auth_key)
recordings = client.recordings.list(limit=20)
for record in recordings:
   print(record.sid)
 
_rid = input("Which recording ID would you like?")
 
request_url = twilio_url + "/2010-04-01/Accounts/" + acct_id + "/Recordings/" + _rid + ".mp3"
response = requests.get(request_url)
with open(_rid+'.mp3', 'wb') as f:
   f.write(response.content)
 
print("File saved to", _rid+".mp3")


When we’re done, the output in the terminal should look like:

Fetching recordings from Twilio

I ran this three times to pull down all three recordings. You can alternatively execute the code below to download all of them at once.

from configure import acct_id, twilio_auth_key
from twilio.rest import Client
import requests
 
twilio_url = "https://api.twilio.com"
client = Client(acct_id, twilio_auth_key)
recordings = client.recordings.list(limit=20)
for record in recordings:
    _rid = record.sid
    request_url = twilio_url + "/2010-04-01/Accounts/" + acct_id + "/Recordings/" + _rid + ".mp3"
    response = requests.get(request_url)
    with open(_rid+'.mp3', 'wb') as f:
        f.write(response.content)
    print("File saved to", _rid+".mp3")


Get an AssemblyAI API key

Go to AssemblyAI to get an API key. You will see your API key where I’ve circled and blocked out in red:

Where to find your AssemblyAI API key

Add this line to your configure.py file

assembly_auth_key = '<your AssemblyAI API Token here>'


Transcribe them via AssemblyAI

Now we'll use AssemblyAI's API to transcribe the .mp3 files we have. We'll transcribe our files with AssemblyAI's Topic Detection feature enabled. Topic Detection is used to detect topics within transcription text. Topic Detection is useful for automatically performing some actions if we detect that the text contains some topic or topics we're looking for. What we'll do is upload our .mp3 file to AssemblyAI's upload endpoint, and then transcribe it with Topic Detection enabled via the AssemblyAI transcription endpoint. We'll download our transcription as a .json file.


from configure import assembly_auth_key
import requests
import json
from time import sleep
 
transcript_endpoint = "https://api.assemblyai.com/v2/transcript"
upload_endpoint = 'https://api.assemblyai.com/v2/upload'
headers = {
   "authorization": assembly_auth_key,
   "content-type": "application/json"
}
CHUNK_SIZE = 5242880
 
def read_file(location):
   with open(location, 'rb') as _file:
       while True:
           data = _file.read(CHUNK_SIZE)
           if not data:
               break
           yield data
 
location = input("Which file do you want to transcribe?")
 
upload_response = requests.post(
   upload_endpoint,
   headers=headers, data=read_file(location)
)
audio_url = upload_response.json()['upload_url']
print('Uploaded to', audio_url)
transcript_request = {
   'audio_url': audio_url,
   'iab_categories': 'True'
}
 
transcript_response = requests.post(transcript_endpoint, json=transcript_request, headers=headers)
transcript_id = transcript_response.json()['id']
polling_endpoint = transcript_endpoint + "/" + transcript_id
print("Transcribing at", polling_endpoint)
polling_response = requests.get(polling_endpoint, headers=headers)
while polling_response.json()['status'] != 'completed':
   sleep(5)
   print("Transcript processing ...")
   try:
       polling_response = requests.get(polling_endpoint, headers=headers)
   except:
       print("Expected to wait 30 percent of the length of your video")
       print("After wait time is up, call poll with id", transcript_id)
categories_filename = transcript_id + '_categories.json'
with open(categories_filename, 'w') as f:
   f.write(json.dumps(polling_response.json()['iab_categories_result']))
print('Categories saved to', categories_filename)

When we are done, a request to transcribe one file should look like:

Upload and Transcribe your mp3 files via AssemblyAI


COMPARISON

I realize that we clearly weren’t able to get topic detection with Twilio’s built in transcription as we could with AssemblyAI’s API, but we can still compare their transcription accuracy. We’ll build two new scripts to do this. First, one to retrieve all our transcripts from Twilio. Second, one to print out our transcripts from AssemblyAI.


Our Twilio script will invoke the client, fetch all our transcripts on our account (we should only have 3 right now) and print them out:

from twilio.rest import Client
from configure import twilio_auth_key, acct_id
 
client = Client(acct_id, twilio_auth_key)
 
transcriptions = client.transcriptions.list()
 
for record in transcriptions:
   _tid = record.sid
   transcript = client.transcriptions(_tid).fetch()
   print(transcript.transcription_text)


Our AssemblyAI script will just take our downloaded JSON’s and print out the text. I manually loaded the JSON file names.

import json
 
files = ['pmo31a8t2-7778-4e20-bf0d-baff7fbaf72f_categories.json',
'pmojwgqta-8cc3-4a6c-a881-c8b83e3a9cce_categories.json',
'pmem7j6cw-4552-4baf-b5b8-4b90dc39f508_categories.json']
for _file in files:
   f = open(_file,'r')
   json_obj = json.load(f)
   text = json_obj['results'][0]['text']
   print(text)


In a side by side comparison:

Twilio transcriptions
AssemblyAI transcriptions


We can see that AssemblyAI’s transcription is more accurate than Twilio’s built in transcription service even at messages as short as these. We can also compare the pricing for AssemblyAI and the pricing for Twilio’s built in transcription and see that AssemblyAI costs less than ⅓ as much ($0.015 vs $0.05).


Cleaning Up

Finally, after you’re done giving out your burner phone number to those pesky services that keep asking for a phone number, you can delete it.

from configure import acct_id, twilio_auth_key
from twilio.rest import Client
 
client = Client(acct_id, twilio_auth_key)
 
phone_numbers = client.incoming_phone_numbers.list()
 
for record in phone_numbers:
   print(record.sid)
 
_del = input("Which phone number SID would you like to delete?")
 
client.incoming_phone_numbers(_del).delete()


It should run like this:

Deleting your Twilio provisioned phone number

Conclusion

In this tutorial, we found out how to use Python to programmatically get a number from Twilio, set up a Flask application to respond to and record a phone call, transcribe our phone call with AssemblyAI, and delete our number after we’re done with it.


For more information about AssemblyAI, check out our blog for tutorials and updates. Follow us on Twitter @assemblyai for updates, and follow the author, @yujian_tang for more tutorials.


Subscribe to our blog!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

You may also like

Checkout some of our recent research and product updates

How to Convert an MP3 File to Text with an API
Tutorials
How to Convert an MP3 File to Text with an API

how to convert an mp3 file to text with an API

Is Word Error Rate a Good Measure of Speech Recognition Systems?
Deep Learning
Is Word Error Rate a Good Measure of Speech Recognition Systems?

What is Word Error Rate? Word Error Rate is a measure of how accurate an Automatic Speech Recognition (ASR) system performs. Quite literally, it calculates how many “errors” are in the transcription text produced by an ASR system, compared to a human transcription.

The State of Python Speech Recognition in 2021
Tutorials
The State of Python Speech Recognition in 2021

The State of Python Speech Recognition in 2021

build with assemblyai

Accurately convert your audio and video files to text with AssemblyAI's Speech-to-Text API