Universal-Streaming
Handshake
Headers
Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter.
Query parameters
A list of words and phrases to improve recognition accuracy for. See Keyterms Prompting for more details.
The maximum amount of silence in milliseconds allowed in a turn before end of turn is triggered. See Turn Detection for configuration details.
The minimum amount of silence in milliseconds required to detect end of turn when confident. See Turn Detection for configuration details.
Whether to enable Streaming Speaker Diarization. When enabled, each Turn event will include a speaker_label field indicating the speaker.
The maximum number of speakers expected in the audio stream (1-10). Setting this can improve speaker label accuracy when you know the number of speakers in advance. Only used when speaker_labels is enabled. See Streaming Diarization for more details.
API token for authentication (if using a temporary token).
The confidence threshold (0.0 to 1.0) for classifying audio frames as silence. Frames with VAD confidence below this value are considered silent. Increase for noisy environments to reduce false speech detection.
The confidence threshold (0.0 to 1.0) to use when determining if the end of a turn has been reached. See Turn Detection for configuration details.
Note: This parameter is only supported for the Universal-streaming model.
Send
Receive
Receive a formatted turn-based transcription result.