Sentiment Analysis
AssemblyAI offers a cutting-edge Sentiment Analysis model that can detect the sentiment of each sentence spoken in audio files. When you enable it, you can get a detailed analysis of the positive, negative, or neutral sentiment conveyed in the audio, along with a confidence score for each result.
You can also learn the content on this page from Sentiment Classification for Audio Files in Python on AssemblyAI's YouTube channel.
Quickstart
In the submitting files for transcription guide, include the sentiment_analysis
parameter in your request body and set it to true
.
You can explore the full JSON response here:
Show JSON
You run this code snippet in Colab here, or you can view the full source code here.
Understanding the response
The JSON object above contains all information about the transcription. Depending on which Models are used to analyze the audio, the attributes of this object will vary. For example, in the quickstart above we did not enable Summarization, which is reflected by the summarization: false
key-value pair in the JSON above. Had we activated Summarization, then the summary
, summary_type
, and summary_model
key values would contain the file summary (and additional details) rather than the current null
values.
To access the Sentiment Analysis information, we use the sentiment_analysis
and sentiment_analysis_results
keys:
The reference table below lists all relevant attributes along with their descriptions, where we've called the JSON response object results
. Object attributes are accessed via dot notation, and arbitrary array elements are denoted with [i]
.
For example, results.words[i].text
refers to the text
attribute of the i-th
element of the words
array in the JSON results
object.
results.sentiment_analysis | boolean | Whether Sentiment Analysis was enabled in the transcription request |
results.sentiment_analysis_results | array | A temporal sequence of Sentiment Analysis results for the audio file, one element for each sentence in the file |
results.sentiment_analysis_results[i].text | string | The transcript of the i-th sentence |
results.sentiment_analysis_results[i].start | number | The starting time, in milliseconds, of the i-th sentence |
results.sentiment_analysis_results[i].end | number | The ending time, in milliseconds, of the i-th sentence |
results.sentiment_analysis_results[i].sentiment | string | The detected sentiment for the i-th sentence, one of POSITIVE , NEUTRAL , NEGATIVE |
results.sentiment_analysis_results[i].confidence | number | The confidence score for the detected sentiment of the i-th sentence, from 0 to 1 |
results.sentiment_analysis_results[i].speaker | string or null | The speaker of the i-th sentence if Speaker Diarization is enabled, else null |
Troubleshooting
The Sentiment Analysis model is based on the interpretation of the transcript and may not always accurately capture the intended sentiment of the speaker. It's recommended to take into account the context of the transcript and to validate the sentiment analysis results with human judgment when possible.
The Content Moderation model can be used to identify and filter out sensitive or offensive content from the transcript.
It's important to ensure that the audio being analyzed is relevant to your use case. Additionally, it's recommended to take into account the context of the transcript and to evaluate the confidence score for each sentiment label.
The Sentiment Analysis model is designed to be fast and efficient, but processing times may vary depending on the size of the audio file and the complexity of the language used. If you experience longer processing times than expected, don't hesitate to contact our support team.