Announcements

New - 8.37% Better Accuracy for Topic Detection and IAB Classification with V4 Update

Our deep learning research team is thrilled to announce the release of the latest version of AssemblyAI’s Topic Detection feature, version 4 (v4).

New - 8.37% Better Accuracy for Topic Detection and IAB Classification with V4 Update

Our research team is thrilled to announce the release of the latest version of AssemblyAI’s Topic Detection feature, version 4 (v4).

Our Topic Detection feature accurately predicts topics spoken in audio/video files, which is used by many of our customers transcribing podcast, video, and other media data. This topic information helps customers better understand the content they are transcribing, which can help power content recommendations, internal data analysis, and advertising use cases.

Version 4 is powered by our most advanced deep learning neural network yet. The model is both larger and was trained on more data than previous versions, realizing a significant 8.37% increase in relative accuracy compared to v3.

Topic detection within spoken language can be complex. But with v4, AssemblyAI’s Topic Detection feature update demonstrates a powerful understanding of the nuances of human language.

Take a look at the below examples:

Prompted by something that we did here at Forward this fall and credit 
where credit is due inspired by Santa Cruz Shakespeare. So we have 
always aimed for budget transparency here at Forward and in fact, as a 
sort of additional step last year, when we were creating an updated EDI 
and antiracism plan, we committed to putting our full budget on our 
website for transparency. But then, thanks to the often joys of theater 
Twitter, where I can find fabulous exciting new practices from 
colleagues around the country, a dramatic I know shared that Santa 
Cruise Spear had put their operating budget into their play bill, and I 
was really struck by that and the transparency of that and the 
accessibility of that and the communication with your audience of where 
you put your money and where their money goes when they buy a ticket or 
make a donation. So we did that. We printed our budget right there in 
the play bill for the first show of our fall season, which just closed 
a few days ago.
FineArt>Theater: 100%
EventsAndAttractions>Musicals: 74%
EventsAndAttractions>Concerts&MusicEvents: 5%
EventsAndAttractions>TheaterVenuesAndEvents: 5%
PersonalFinance>FinancialPlanning: 5%

In this sample, the Topic Detection model is able to pull words like “Shakespeare,” “audience,” “playbill,” and “ticket” to correctly determine this audio file is about theatre.

In my mind, I was basically done with Robbie Ray. He had shown flashes 
in the past, particularly with the strike. It was just too inefficient 
walk too many guys and got hit too hard too.
Sports>Baseball: 100%

And in this sample, the Topic Detection model was able to determine that this audio snippet was about baseball by associating the name “Robbie Ray” (a starting pitcher for the Toronto Blue Jays) with the sport.

Our Topic Detection feature uses the IAB Content Taxonomy, a list of 698 “common language” topics, when assigning topics to transcribed content.


You can learn more about how Topic Detection works here.