Skip to main content

Sentiment Analysis

AssemblyAI offers a cutting-edge Sentiment Analysis model that can detect the sentiment of each sentence spoken in audio files. When you enable it, you can get a detailed analysis of the positive, negative, or neutral sentiment conveyed in the audio, along with a confidence score for each result.

You can also learn the content on this page from Sentiment Classification for Audio Files in Python on AssemblyAI's YouTube channel.


In the submitting files for transcription guide, include the sentiment_analysis parameter in your request body and set it to true.

You can explore the full JSON response here:


You run this code snippet in Colab here, or you can view the full source code here.

Understanding the response

The JSON object above contains all information about the transcription. Depending on which Models are used to analyze the audio, the attributes of this object will vary. For example, in the quickstart above we did not enable Summarization, which is reflected by the summarization: false key-value pair in the JSON above. Had we activated Summarization, then the summary, summary_type, and summary_model key values would contain the file summary (and additional details) rather than the current null values.

To access the Sentiment Analysis information, we use the sentiment_analysis and sentiment_analysis_results keys:

The reference table below lists all relevant attributes along with their descriptions, where we've called the JSON response object results. Object attributes are accessed via dot notation, and arbitrary array elements are denoted with [i]. For example, results.words[i].text refers to the text attribute of the i-th element of the words array in the JSON results object.

results.sentiment_analysisbooleanWhether Sentiment Analysis was enabled in the transcription request
results.sentiment_analysis_resultsarrayA temporal sequence of Sentiment Analysis results for the audio file, one element for each sentence in the file
results.sentiment_analysis_results[i].textstringThe transcript of the i-th sentence
results.sentiment_analysis_results[i].startnumberThe starting time, in milliseconds, of the i-th sentence
results.sentiment_analysis_results[i].endnumberThe ending time, in milliseconds, of the i-th sentence
results.sentiment_analysis_results[i].sentimentstringThe detected sentiment for the i-th sentence, one of POSITIVE, NEUTRAL, NEGATIVE
results.sentiment_analysis_results[i].confidencenumberThe confidence score for the detected sentiment of the i-th sentence, from 0 to 1
results.sentiment_analysis_results[i].speakerstring or nullThe speaker of the i-th sentence if Speaker Diarization is enabled, else null


What if the model predicts the wrong sentiment label for a sentence?

The Sentiment Analysis model is based on the interpretation of the transcript and may not always accurately capture the intended sentiment of the speaker. It's recommended to take into account the context of the transcript and to validate the sentiment analysis results with human judgment when possible.

What if the transcript contains sensitive or offensive content?

The Content Moderation model can be used to identify and filter out sensitive or offensive content from the transcript.

What if the sentiment analysis results aren't consistent with my expectations?

It's important to ensure that the audio being analyzed is relevant to your use case. Additionally, it's recommended to take into account the context of the transcript and to evaluate the confidence score for each sentiment label.

What if the sentiment analysis is taking too long to process?

The Sentiment Analysis model is designed to be fast and efficient, but processing times may vary depending on the size of the audio file and the complexity of the language used. If you experience longer processing times than expected, don't hesitate to contact our support team.