Tutorials

Auto-Tweet Your Words Using Speech Recognition in Python

We say the funniest things when no one is listening. But what if someone did, all the time? In this article, we will learn how to make an app that will listen to you all the time and Tweet the funniest, smartest or most relatable things you say out loud.

Auto-Tweet Your Words Using Speech Recognition in Python

Table of contents

We say the funniest things when no one is listening. But what if someone did, all the time? In this article, we will learn how to make an app that will listen to you and tweet the funniest, smartest or most relatable things you say out loud.

The app will work by listening to you and transcribing your sentences. After you say something you would like to tweet, you can say the keyword "tweet" and it will post your latest sentence on Twitter.

We will make the app with Python. The main libraries will be:

  • PyAudio for listening to the input source
  • Twython for easy use of the Twitter API
  • AssemblyAI for Speech-to-Text transcription

Setting up the dependencies

Before coding at all, we need Twitter and AssemblyAI credentials. Getting an AssemblyAI API token is very simple. Just sign up for AssemblyAI and log in to find your token. If you have never used AssemblyAI before, you can get a free API token.

Get your Free API token

In order to use the Twitter API, go to Twitter Developer Portal and create an account. After providing some information to Twitter, you need to create a project and get the necessary credentials. For this project, you need read and write permissions.

There will be two files in this project. The main Python script and a configuration file. Fill in your configuration file with the authentication key from AssemblyAI and other credentials from Twitter like so:

auth_key = ''
consumer_key        = ''
consumer_secret     = ''
access_token        = ''
access_token_secret = ''

In the main script, we start by importing all the libraries we need.

import websockets
import asyncio
import base64
import json
import pyaudio
from twython import Twython
from configure import *

Listening with the microphone

Next up is setting up the parameters of PyAudio and starting a stream.

FRAMES_PER_BUFFER = 3200
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
p = pyaudio.PyAudio()
 
# starts recording
stream = p.open(
  format=FORMAT,
  channels=CHANNELS,
  rate=RATE,
  input=True,
  frames_per_buffer=FRAMES_PER_BUFFER
)

The main point to pay attention to here is the RATE parameter. When setting up the connection to an AssemblyAI endpoint, the same sample rate needs to be specified. Since in this project sentences will be transcribed in real-time, we will use AssemblyAI's real-time endpoint.

URL = "wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000"

Connecting to Twitter

As mentioned above, we are using Twython to set up a connection to Twitter API. For this, only the credentials from the configure.py file are needed.

twitter = Twython(
  consumer_key,
  consumer_secret,
  access_token,
  access_token_secret
)

The next step in building the auto-tweeter app is to set up the asynchronous behavior of constantly listening and having sentences transcribed. For this, Python's asyncio library will be used.

Sending audio to AssemblyAI

We have one function for listening and transcribing each. The listening function will be called send. Its main goal is to capture audio and send it to AssemblyAI's Speech-to-Text API. The transcribing function will be called receive and its primary goal is to constantly listen to AssemblyAI's endpoint to get transcribed audio results.

Let's take a look at the listening function first. The function is going to be running indefinitely to send whatever is spoken into the microphone to AssemblyAI. Thus the while True line. The actual functionality of the function is wrapped in try and except blocks to catch any potential errors. The four lines inside the try block perform the main functionality of this function.

async def send():
  while True:
    try:
      data = stream.read(FRAMES_PER_BUFFER)
      data = base64.b64encode(data).decode("utf-8")
      json_data = json.dumps({"audio_data":str(data)})
      r = await _ws.send(json_data)

    except websockets.exceptions.ConnectionClosedError as e:
      print(e)
      assert e.code == 4008

    except Exception as e:
      print(e)
      assert False, "Not a websocket 4008 error"

    r = await asyncio.sleep(0.01)

What send function does is quite simple. After capturing the audio, the function encodes it into the necessary format and sends it to AssemblyAI.

Interpreting the transcription

The Receive function on the other hand, as the name suggests, receives the results from AssemblyAI.

async def receive():
  while True:
    try:
      result_str = await _ws.recv()
      result = json.loads(result_str)['text']

      if json.loads(result_str)['message_type']=='FinalTranscript':
        print(result)
        if result == 'Tweet.' and previous_result!='':
          twitter.update_status(status=previous_result)
          print("Tweeted: %s" % previous_result)
        previous_result = result

    except websockets.exceptions.ConnectionClosedError as e:
      print(e)
      assert e.code == 4008

    except Exception as e:
      print(e)
      assert False, "Not a websocket 4008 err

Let's take it step by step and decipher this block of code. The "result_str" variable has the responses from AssemblyAI. Here is what it looks like:

The response has complete information on this transcription, including the audio start and end timestamps, the confidence of the transcription and the resulting text. One critical attribute that is very helpful is "message_type" all the way at the end of the response.

As things currently stand, AssemblyAI will be sending back unfinished sentences or "PartialTranscripts" since we are using the real-time endpoint. After AssemblyAI acknowledges the ending of a sentence, it will add the correct punctuation and complete the sentence by capitalizing letters where necessary. We want to only post full sentences to Twitter and not partial transcripts. That's why the responses are filtered using the line:

if json.loads(result_str)['message_type']=='FinalTranscript':

Tweeting transcribed sentences

The rest of the lines deal with the tweeting of the sentences. The previous timestep's transcription is kept in mind by assigning it to the "previous_result" variable. And whenever the resulting transcript of the current timestep is "Tweet." we post the last sentence with the line:

twitter.update_status(status=previous_result)

Establishing asynchronous behavior

These two functions (send and receive) will be wrapped in another function to be able to run them asynchronously.

async def send_receive():

  print(f'Connecting to url ${URL}')

  async with websockets.connect(
    URL,
    extra_headers=(("Authorization", auth_key),),
    ping_interval=5,
    ping_timeout=20
  ) as _ws:

    r = await asyncio.sleep(0.1)
    print("Receiving SessionBegins ...")

    session_begins = await _ws.recv()
    print(session_begins)
    print("Sending messages ...")
    result = ''

    async def send():
      while True:
        ...


    async def receive():
      while True:
        ...
                
                
    send_result, receive_result = await asyncio.gather(send(), receive())

Other than wrapping the send and receive functions, this function also makes the connection to AssemblyAI using websockets. And after defining the functions, it calls the send and receive functions to run at the same time.

Of course, after defining this function, we need to call it at the end of the script. Here is the line to do that:

asyncio.run(send_receive())

You can find the code on GitHub.

Prefer to watch this tutorial? Find the video here: