PII Redaction for Speech-to-Text Transcriptions (March 2020 Update)

AssemblyAI's research team has launched a new neural network update, with big improvements in accuracy and speed. Below are some of the highlights that we're excited to share with you!

Further accuracy improvements

With our most recent update, we are consistently out-benchmarking all other API's in accuracy on our asynchronous English model. Below are our average accuracy %'s (using Word Error Rate or WER) based on benchmark reports run this month versus Google Cloud's video model, AWS Transcribe, and Microsoft Azure:

WER Results - March.png

Interested in benchmarking? Compare our accuracy and price side-by-side with your current provider by submitting a few of your files here.

Faster transcription with Speed Boost

We work with a large number of telephony companies powering their visual voicemail. In visual voicemail applications, the turnaround time for transcriptions is key, so customers can see their visual voicemail shortly after their call ends.

To improve on this, we've added the speed boost feature which is built to transcribe 1 minute (or less) audio files in seconds. Your transcription will complete anywhere from 25-50% faster than normal for transcripts generated with this feature turned on.

Below is an example of how to turn on speed boost (Python example), for snippets of code in other languages, take a look at the full API docs here.

import requests

endpoint = "https://api.assemblyai.com/v2/transcript"

json = {
  "audio_url": "https://s3-us-west-2.amazonaws.com/blog.assemblyai.com/audio/8-7-2018-post/7510.mp3",
  "speed_boost": True
}

headers = {
    "authorization": "YOUR-API-TOKEN",
    "content-type": "application/json"
}

response = requests.post(endpoint, json=json, headers=headers)

print(response.json())

Improved Security: PII redaction

Dealing with sensitive personal identifiable information (PII)?

PII redaction automatically detects and removes sensitive numbers, like credit card and social security numbers, from the transcription text. These sensitive numbers are replaced with pound signs "#" in the transcript.

Below is an example of how to turn on PII Redaction (Python example), for snippets of code in other languages, take a look at the full API docs here.

import requests

endpoint = "https://api.assemblyai.com/v2/transcript"

json = {
  "audio_url": "https://app.assemblyai.com/static/media/phone_demo_clip_1.wav",
  "redact_pii": True
}

headers = {
    "authorization": "YOUR-API-TOKEN",
    "content-type": "application/json"
}

response = requests.post(endpoint, json=json, headers=headers)

print(response.json())

100% Uptime

Another month of 100% uptime across all our models, subscribe to our status page to stay up-to-date!