How to get Zoom Transcripts with the Zoom API

In this post, we’re going to show you how to transcribe Zoom recordings by connecting Zoom’s API with AssemblyAI’s automatic speech recognition API, in Python!

How to get Zoom Transcripts with the Zoom API

In this post, we’re going to show you how to transcribe your Zoom recordings by connecting Zoom’s API with AssemblyAI’s automatic speech recognition API in Python. More specifically, we’ll walk you through:

  • Creating a cloud recording on zoom
  • Accessing Zoom’s API with your own account
  • Fetching a cloud recording from Zoom’s API
  • Submitting the cloud recording to AssemblyAI for fast, automatic transcription

Get your coding fingers ready, and let’s dive in!

Creating a Cloud Recording on Zoom

The first thing to know when creating a cloud recording in Zoom is that you must have a Pro, Business, or Enterprise account. The basic free Zoom plan does not have a meeting recording feature enabled. You may already have such a plan, and may already have cloud recordings ready for transcription, in which case, you can skip to the next header and keep rolling.

But if you've just gotten an upgraded Zoom account for your project, and you still need to create a cloud recording, a few things to keep in mind. First, you do not need to have another person on the meeting for cloud recording to work. No need to schedule with someone else.

The button to start a recording looks like this if you're on a desktop, laptop, or large tablet:

Upon ending the recording, a pop-up box will tell you that you'll get an email once the recording is ready. Zoom's help pages cautiously warn users that it could take 24-48 hours for it to be ready, but that was likely the reality in early 2020 when the whole world flocked onto Zoom before their systems were ready for that level of traffic. It may still take a while for longer recordings, but I've found that here in 2021, they are ready just about immediately.

To find your recording, sign into Zoom, and navigate to My Account.

To the left, click on the Recordings tab.

And now, the promised land! There you will see your cloud recordings, ready for you in mp4 and m4a format.

Accessing Your Zoom API Key

Now, let's step through the process of getting keys to Zoom's backdoor. I'd like to give a shoutout to Billy Harmawan for showing me this somewhat convoluted process. I'll give you the gist of it here too.

First, you will need to navigate to Zoom's App Marketplace in a new tab. You may have already been signed in from your main account page, but if not, you can sign in using the same credentials, and then navigate back to the Zoom App Marketplace once logged in.

Second, click "Develop" in the upper right corner, and "Build App" from the drop-down menu.

Third, you will need to choose "Create" under the "JWT" app type. We will be using JSON Web Tokens (JWT's) for this transcription connection because of their ease of regeneration, use in code, and appendability to url's.

Fourth, come up with a unique name for your Zoom app.

Fifth, you will need to fill out the required fields on the information page that it takes you to.

Sixth, on the following page, your API credentials should be generated!

From here, create a tokens.env text file, and save each credential under a variable name, like below. You will need this .env file in the code later.

A note: that first JWT token that is generated on the credentials page is nice to have in the dotenv file just in case, but our code will be generating new JWT's that will be used instead. So have it copied, but just know that it might not be necessary.

And lastly, you can skip the optional "Feature" tab on this app creation page and go directly to "Activation" to confirm that you are finished.

These next few steps are just double and triple checks to make sure that the app exists in your account and that the credentials can be managed in the future if you need to.

Navigate back to Zoom's App Marketplace and click the "Manage" button in the upper right corner.

There you will see your app listed.

Click on the name of the app, and when you see this screen once again...

...you can rest assured knowing deep in your bones that the Zoom's API is ready for you.

Sign Up for an AssemblyAI API Token

You can sign up for a free AssemblyAI account in seconds by just entering your email address. After you verify your account from your email address, you're taken right back to your new account where you can see your API token in your dashboard.

Take that API Token and add it to your tokens.env file in the same way you did for Zoom's API Keys, with it's own unique variable name.






Fetching a Cloud Recording from Zoom's API in Python

Spin up a zoom.py file and plop these import statements at the top:

import sys
import time
import requests
import authlib
import os
import urllib.request
from dotenv import load_dotenv
from pathlib import Path
from typing import Optional, Dict, Union, Any
from authlib.jose import jwt
from requests import Response
import http.client
import json

Now find your directory path to your tokens.env file, or move the tokens.env file to the directory where your zoom.py lives, and then add this code to your zoom.py script:

env_path = Path('.')/'tokens.env'

Next, you'll scoop your credentials out of the tokens.env file by naming them to some variables. The variables, which you'll be using in subsequent code, are on the left of the "=", and the alias variable you used for that API credential in the tokens.env file will be the strings on the right of the "=" assigner.

API_KEY= os.getenv("ZOOM_API_KEY")

Now, you'll create a Zoom class that will generate live JWT's that you'll need to unlock your cloud recordings throughout the rest of the code. Again, shoutout to Billy Harmawan for laying this out.

class Zoom:
    def __init__(self, api_key: str, api_secret: str):
        self.api_key = api_key
        self.api_secret = api_secret
        self.jwt_token_exp = 518400
        self.jwt_token_algo = "HS256"

    def generate_jwt_token(self) -> bytes:
        iat = int(time.time())

        jwt_payload: Dict[str, Any] = {
            "aud": None,
            "iss": self.api_key,
            "exp": iat + self.jwt_token_exp,
            "iat": iat

        header: Dict[str, str] = {"alg": self.jwt_token_algo}

        jwt_token: bytes = jwt.encode(header, jwt_payload, self.api_secret)

        return jwt_token

Then, you'll create an instance of that Zoom class and use it to generate a live JWT.

zoom = Zoom(API_KEY, API_SECRET)
jwt_token: bytes = zoom.generate_jwt_token()
jwt_token_str = jwt_token.decode('UTF-8')

Use the code below, which comes directly from Zoom’s API documentation, to get your USER_ID.

If are using a shared account with multiple users, you may have to login to your zoom account and navigate to "My Account” to clarify exactly which users have the cloud recordings you want to transcribe. The code below uses ["users"][0] (the first user returned) as an example, but you may have to adjust the index number depending on which user has the cloud recording you need to access.

conn = http.client.HTTPSConnection("api.zoom.us")

headers = { 'authorization': 'Bearer ' + jwt_token_str}

conn.request("GET", "/v2/users?page_size=30&status=active", headers=headers)

res = conn.getresponse()
data = res.read()
user_dict = json.loads(data.decode("utf-8"))
USER_ID = user_dict['users'][0]['id']


For this next step too, you may have to specify date ranges in the url request, which the autogeneration in the the Zoom API Documentation shows you how to do. The default is to only return the last day's worth of meetings, so if the meeting recording you want to transcribe happened before that, you will have to specify per format.

‍From here, you can find the MEETING_ID of the cloud recording you want to access with the code below. Note: meeting_dict['meetings'] accesses a list of meetings, but we will just be using the first, meeting_dict['meetings'][0], as an example.

conn = http.client.HTTPSConnection("api.zoom.us")

    'GET', '/v2/users/' + USER_ID +

res = conn.getresponse()
data = res.read()
meeting_dict = json.loads(data.decode("utf-8"))
MEETING_ID = str(meeting_dict['meetings'][0]['id'])


To attain the download_url, add the code below.

    "GET", '/v2/meetings/' + MEETING_ID + '/recordings', headers=headers

res = conn.getresponse()
data = res.read()

response_obj = (data.decode("utf-8"))


After that, you can now grab the download_url as your very own!

meeting_dict = json.loads(response_obj)
download_url = meeting_dict['recording_files'][0]['download_url']

One last shoutout to TriBloom and Caleb Spraul for helping me find this next single line of code that makes your download URL mutually intelligible to both the Zoom and AssemblyAI API's...

authorized_url = download_url + "?access_token=" + jwt_token_str

Submit the Audio File to AssemblyAI for Transcription

The code below uses the authorized_url variable to feed the Zoom Cloud Recording mp4 directly into AssemblyAI's speech recognition API.

endpoint = 'https://api.assemblyai.com/v2/transcript'

json = {
    'audio_url': authorized_url

heads = {
    'authorization': ASSEMBLY_AI_TOKEN,
    'content-type': 'application/json'

resp = requests.post(endpoint, json=json, headers=heads)

A successful response will look like so, displaying the "status": "queued"‍.

    # keep track of the id for later
    "id": "5551722-f677-48a6-9287-39c0aafd9ac1",
    # note that the status is currently "queued"
    "status": "queued",    
    "acoustic_model": "assemblyai_default",
    "audio_duration": null,
    "audio_url": "some/obscenely/long/zoom/api/url/",
    "confidence": null,
    "dual_channel": null,
    "format_text": true,
    "language_model": "assemblyai_default",
    "punctuate": true,
    "text": null,
    "utterances": null,
    "webhook_status_code": null,
    "webhook_url": null,
    "words": null

Get That Transcription You've Been Longing For!

This code below will check the status of the transcription, and concatenates the transcription id resp.json()['id'] into the url for you, so it will run sequentially with the previous code.

status_point = 'https://api.assemblyai.com/v2/transcript/' + resp.json()['id']

status_header = {'authorization':ASSEMBLY_AI_TOKEN} 

status_check = requests.get(status_point, headers=status_header)


Audio files take around 25% of their audio duration to complete, so a 10 minute audio file would complete within 2.5 minutes. You'll want to run this above code in a loop until the status key shows "completed".

import time

while status_check.json()['status'] in ['queued', 'processing']:
  status_check = requests.get(status_point, headers=status_header)

print('\n', status_check.json()['text'])
print('\n', status_check.json())

Once the status is "completed", the second print statement print(status_check.json()['text']) will print out the actual transcription text. The third print statement prints out the entire API response, with a ton more metadata like the timing of when each word was spoken, the confidence for each word, and more! You can see a complete example of the API's response JSON on the AssemblyAI Docs here.

AND NOW you know how to transcribe Zoom recordings!