Changelog

Follow along to see weekly accuracy and product improvements.

May 23, 2023

Significant processing time improvement

We’ve made significant improvements to our transcoding pipeline, resulting in a 98% overall speedup in transcoding time and a 12% overall improvement in processing time for our asynchronous API.

We’ve implemented a caching system for some third-party resources to ensure our continued operations in the event of external resources being down.

May 16, 2023

Announcing LeMUR - our new framework for applying powerful LLMs to transcribed speech

We’re introducing our new framework LeMUR, which makes it simple to apply Large Language Models (LLMs) to transcripts of audio files up to 10 hours in length.

LLMs unlock a range of impressive capabilities that allow teams to build powerful Generative AI features. However, building these features is difficult due to the limited context windows of modern LLMs, among other challenges that necessitate the development of complicated processing pipelines.

LeMUR circumvents this problem by making it easy to apply LLMs to transcribed speech, meaning that product teams can focus on building differentiating Generative AI features rather than focusing on building infrastructure. Learn more about what LeMUR can do and how it works in our announcement blog, or jump straight to trying LeMUR in our Playground.

May 15, 2023

New PII and Entity Detection Model

We’ve upgraded to a new and more accurate PII Redaction model, which improves credit card detections in particular.

We’ve made stability improvements regarding the handling and caching of web requests. These improvements additionally fix a rare issue with punctuation detection.

May 2, 2023

Multilingual and stereo audio fixes, & Japanese model retraining

We’ve fixed two edge cases in our async transcription pipeline that were producing non-deterministic results from multilingual and stereo audio.

We’ve improved word boundary detection in our Japanese automatic speech recognition model. These changes are effective immediately for all Japanese audio files submitted to AssemblyAI.

April 24, 2023

Decreased latency and improved password reset

We’ve implemented a range of improvements to our English pipeline, leading to an average 38% improvement in overall latency for asynchronous English transcriptions.

We’ve made improvements to our password reset process, offering greater clarity to users attempting to reset their passwords while still ensuring security throughout the reset process.

April 10, 2023

Conformer-1 now available for Real-Time transcription, new Speaker Labels parameter, and more

We're excited to announce that our new Conformer-1 Speech Recognition model is now available for real-time English transcriptions, offering a 24.3% relative accuracy improvement.

Effective immediately, this state-of-the-art model will be the default model for all English audio data sent to the wss://api.assemblyai.com/v2/realtime/ws WebSocket API.

The Speaker Labels model now accepts a new optional parameter called speakers_expected. If you have high confidence in the number of speakers in an audio file, then you can specify it with speakers_expected in order to improve Speaker Labels performance, particularly for short utterances.

TLS 1.3 is now available for use with the AssemblyAI API. Using TLS 1.3 can decrease latency when establishing a connection to the API.

Our PII redaction scaling has been improved to increase stability, particularly when processing longer files.

We've improved the quality and accuracy of our Japanese model.

Short transcripts that are unable to be summarized will now return an empty summary and a successful transcript.

March 15, 2023

Introducing our Conformer-1 model

We've released our new Conformer-1 model for speech recognition. Conformer-1 was trained on 650K hours of audio data and is our most accurate model to date.

Conformer-1 is now the default model for all English audio files sent to the /v2/transcript endpoint for async processing.

We'll be releasing it for real-time English transcriptions within the next two weeks, and will add support for more languages soon.

March 8, 2023

New AI Models for Italian / Japanese Punctuation Improvements

Our Content Safety and Topic Detection models are now available for use with Italian audio files.

We’ve made improvements to our Japanese punctuation model, increasing relative accuracy by 11%. These changes are effective immediately for all Japanese audio files submitted to AssemblyAI.

February 21, 2023

Hindi Punctuation Improvements

We’ve made improvements to our Hindi punctuation model, increasing relative accuracy by 26%. These changes are effective immediately for all Hindi audio files submitted to AssemblyAI.

We’ve tuned our production infrastructure to reduce latency and improve overall consistency when using the Topic Detection and Content Moderation models.

January 31, 2023

Improved PII Redaction

We’ve released a new version of our PII Redaction model to improve PII detection accuracy, especially for credit card and phone number edge cases. Improvements are effective immediately for all API calls that include PII redaction.

January 25, 2023

Automatic Language Detection Upgrade

We’ve released a new version of our Automatic Language Detection model that better targets speech-dense parts of audio files, yielding improved accuracy. Additionally, support for dual-channel and low-volume files has been improved. All changes are effective immediately.

Our Core Transcription API has been migrated from EC2 to ECS in order to ensure scalable, reliable service and preemptively protect against service interruptions.

January 19, 2023

Password Reset

Users can now reset their passwords from our web UI. From the Dashboard login, simply click “Forgot your password?” to initiate a password reset. Alternatively, users who are already logged in can change their passwords from the Account tab on the Dashboard.

The maximum phrase length for our Word Search feature has been increased from 2 to 5, effective immediately.

December 29, 2022

Dual Channel Support for Conversational Summarization / Improved Timestamps

We’ve made updates to our Conversational Summarization model to support dual-channel files. Effective immediately, dual_channel may be set to True when summary_model is set to conversational.

We've made significant improvements to timestamps for non-English audio. Timestamps are now typically accurate between 0 and 100 milliseconds. This improvement is effective immediately for all non-English audio files submitted to AssemblyAI for transcription.

December 19, 2022

Improved Transcription Accuracy for Phone Numbers

We’ve made updates to our Core Transcription model to improve the transcription accuracy of phone numbers by 10%. This improvement is effective immediately for all audio files submitted to AssemblyAI for transcription.

We've improved scaling for our read-only database, resulting in improved performance for read-only requests.

December 15, 2022

v9 Transcription Model Released

We are happy to announce the release of our most accurate Speech Recognition model to date - version 9 (v9). This updated model delivers increased performance across many metrics on a wide range of audio types.

Word Error Rate, or WER, is the primary quantitative metric by which the performance of an automatic transcription model is measured. Our new v9 model shows significant improvements across a range of different audio types, as seen in the chart below, with a more than 11% improvement on average.

In addition to standard overall WER advancements, the new v9 model shows marked improvements with respect to proper nouns. In the chart below, we can see the relative performance increase of v9 over v8 for various types of audio, with a nearly 15% improvement on average.

The new v9 transcription model is currently live in production. This means that customers will see improved performance with no changes required on their end. The new model will automatically be used for all transcriptions created by our /v2/transcript endpoint going forward, with no need to upgrade for special access.

While our customers enjoy the elevated performance of the v9 model, our AI research team is already hard at work on our v10 model, which is slated to launch in early 2023. Building upon v9, the v10 model is expected to radically improve the state of the art in speech recognition.

Try our new v9 transcription model through your browser using the AssemblyAI Playground. Alternatively, sign up for a free API token to test it out through our API, or schedule a time with our AI experts to learn more.

December 2, 2022

New Summarization Models Tailored to Use Cases

We are excited to announce that new Summarization models are now available! Developers can now choose between multiple summary models that best fit their use case and customize the output based on the summary length.

The new models are:

  • Informative which is best for files with a single speaker, like a presentation or lecture
  • Conversational which is best for any multi-person conversation, like customer/agent phone calls or interview/interviewee calls
  • Catchy which is best for creating video, podcast, or media titles

Developers can use the summary_model parameter in their POST request to specify which of our summary models they would like to use. This new parameter can be used along with the existing summary_type parameter to allow the developer to customize the summary to their needs.

import requests
endpoint = "https://api.assemblyai.com/v2/transcript"
json = {
    "audio_url": "https://bit.ly/3qDXLG8",
    "summarization": True,
    "summary_model": "informative", # conversational | catchy
    "summary_type": "bullets" # bullets_verbose | gist | headline | paragraph
}
headers = {
	"authorization": "YOUR-API-TOKEN",
    "content-type": "application/json"
}
response = requests.post(endpoint, json=json, headers=headers)
print(response.json())

Check out our latest blog post to learn more about the new Summarization models or head to the AssemblyAI Playground to test Summarization in your browser!

October 31, 2022

Improved Transcription Accuracy for COVID

We’ve made updates to our Core Transcription model to improve the transcription accuracy of the word COVID. This improvement is effective immediately for all audio files submitted to AssemblyAI for transcription.

Static IP support for webhooks is now generally available!

Outgoing webhook requests sent from AssemblyAI will now originate from a static IP address 44.238.19.20, rather than a dynamic IP address. This gives you the ability to easily validate that the source of the incoming request is coming from our server. Optionally, you can choose to whitelist this static IP address to add an additional layer of security to your system.

See our walkthrough on how to start receiving webhooks for your transcriptions.

October 25, 2022

New Audio Intelligence Models: Summarization

import requests
endpoint = "https://api.assemblyai.com/v2/transcript"
json = {
  "audio_url": "https://bit.ly/3qDXLG8",
    "summarization": True,
    "summary_type": "bullets" # paragraph | headline | gist 
}
headers = {
  "authorization": "YOUR-API-TOKEN",
    "content-type": "application/json"
}
response = requests.post(endpoint, json=json, headers=headers)
print(response.json())

Starting today, you can now transcribe and summarize entire audio files with a single API call.

To enable our new Summarization models, include the following parameter: "summarization": true in your POST request to /v2/transcript. When the transcription finishes, you will see the summary key in the JSON response containing the summary of your transcribed audio or video file.

By default, summaries will be returned in the style of bullet points. You can customize the style of summary by including the optional summary_type parameter in your POST request along with one of the following values: paragraph, headline, or gist. Here is the full list of summary types we support.

// summary_type = "paragraph"

"summary": "Josh Seiden and Brian Donohue discuss the
topic of outcome versus output on Inside Intercom.
Josh Seiden is a product consultant and author who has
just released a book called Outcomes Over Output.
Brian is product management director and he's looking
forward to the chat."

// summary_type = "headline"

"summary": "Josh Seiden and Brian Donohue discuss the
topic of outcomes versus output."

// summary_type = "gist"

"summary": "Outcomes over output"

// summary_type = = "bullets"

"summary": "Josh Seiden and Brian Donohue discuss
the topic of outcome versus output on Inside Intercom.
Josh Seiden is a product consultant and author who has
just released a book called Outcomes Over Output.
Brian is product management director and he's looking
forward to the chat.\n- ..."

Examples of use cases for Summarization include:

  • Identify key takeaways from phone calls to speed up post-call review and reduce manual summarization
  • Summarize long podcasts into short descriptions so users can preview before they listen.
  • Instantly generate meetings summaries to quickly recap virtual meetings and highlight post-meeting actions
  • Suggest 3-5 word video titles automatically for user-generated content
  • Synthesize long educational courses, lectures, and media broadcasts into their most important points for faster consumption

We're really excited to see what you build with our new Summarization models. To get started, try it out for free in our no-code playground or visit our documentation for more info on how to enable Summarization in your API requests.

October 19, 2022

Automatic Casing / Short Utterances

We’ve improved our Automatic Casing model and fixed a minor bug that caused over-capitalization in English transcripts. The Automatic Casing model is enabled by default with our Core Transcription API to improve transcript readability for video captions (SRT/VTT). See our documentation for more info on Automatic Casing.

Our Core Transcription model has been fine-tuned to better detect short utterances in English transcripts. Examples of short utterances include one-word answers such as “No.” and “Right.” This update will take effect immediately for all customers.

October 14, 2022

Static IP Support for Webhooks

Over the next few weeks, we will begin rolling out Static IP support for webhooks to customers in stages.

Outgoing webhook requests sent from AssemblyAI will now originate from a static IP address 44.238.19.20, rather than a dynamic IP address. This gives you the ability to easily validate that the source of the incoming request is coming from our server. Optionally, you can choose to whitelist this static IP address to add an additional layer of security to your system.

See our walkthrough on how to start receiving webhooks for your transcriptions.