Real-Time is now Streaming Speech-to-Text

Significant advancements in Speech AI research are making live Speech-to-Text transcription more accurate than ever before. This has led to growing demand for high-quality AI tools, such as AI voice bots for call centers and voice assistants for customer service, that leverage live Speech-to-Text technology.

With AssemblyAI’s Streaming Speech-to-Text (previously Real-Time) model, users can expect to build with the same powerful technology under a new name and a few improvements:

More customization and control
Lower cost to build (which was originally announced this past January)

These updates make it easier to build next-generation AI tools and products on top of live speech transcription.

Advanced use cases for Streaming Speech-to-Text

Historically, live Speech-to-Text users only had access to limited AI technology and were not content with how stilted and unnatural conversations felt when using other tools.

AssemblyAI’s Streaming Speech-to-Text (Streaming STT) offers a best-in-class experience for users who are looking for a seamless option. Streaming STT includes accurate, customizable end-of-utterance detection, which ensures that conversations are transcribed more naturally to enable better AI-human interactions.

Companies are now using Streaming STT for a variety of purposes:

Live captions for streaming audio and video
AI voice assistants and voice bots for customer service, call centers and sales applications
Language learning tools
Accessibility applications
Virtual meetings

How to customize end-of-utterance detection

End-of-utterance detection enables the live speech-to-text model to identify when the human is finished speaking.

With our recent update to Streaming STT, developers can now customize how and when a speaker is done talking.

Developers can modify end-of-utterance detection in two ways:

By deciding that the model will wait for less (or more) silence to transpire before declaring that the speaker is done speaking. This is accomplished by modifying the parameters displayed in the commented line.

import assemblyai as aai

transcriber = aai.RealtimeTranscriber(
    on_data=on_data_callback,
    on_error=on_error_callback,
    sample_rate=sample_rate,
    end_utterance_silence_threshold=300  # Custom threshold for end of utterance detection
)

transcriber.connect()

audio_stream = ... 
for audio_chunk in audio_stream:
    transcriber.stream(audio_chunk)

By forcing an end of utterance to happen programmatically. This is accomplished by modifying the parameters displayed in the commented line.

import assemblyai as aai 

transcriber = aai.RealtimeTranscriber(
    on_data=on_data_callback,
    on_error=on_error_callback,
    sample_rate=sample_rate,
)

transcriber.connect()

audio_stream = ... 
for audio_chunk in audio_stream:
    transcriber.stream(audio_chunk)
    speaker_changed = ... 
    if speaker_changed:
        transcriber.force_end_utterance()  # Steers the model to produce a final transcript

With these new controls, developers can build more natural interactions with their AI tools, giving their users a better overall experience with live Speech-to-Text applications.

Unlock powerful live Speech-to-Text use cases with faster, lower-cost STT

Streaming Speech-to-Text allows users to transcribe live audio streams with high accuracy and low latency at a lower price of $0.47 per hour (reduced from $0.75 per hour), or $0.0001306 per second, of audio data. This includes access to the Streaming Speech-to-Text model, Automatic Punctuation and Casing, and Custom Vocabulary.

To use the service, users stream audio data to our secure WebSocket API and receive transcripts back within a few hundred milliseconds.

Users can follow along with step-by-step instructions in our docs to get started, or by using one of our official SDKs.

Real-Time is now Streaming Speech-to-Text, with added customization and control for users

Advanced use cases for Streaming Speech-to-Text

How to customize end-of-utterance detection

Unlock powerful live Speech-to-Text use cases with faster, lower-cost STT

Popular posts

How to convert speech to text in Java

Speech recognition in the browser using Web Speech API

How Delphi leverages AI to create digital clones of thought leaders

7 no-code and low-code ways to build AI-powered Speech-to-Text tools