Insights & Use Cases
April 2, 2026

Node.js voice agent with AssemblyAI Universal-3 Pro Streaming

Build a real-time voice agent in Node.js using the AssemblyAI Universal-3 Pro Streaming model

Reviewed by
No items found.
Table of contents

Build a real-time voice agent in Node.js using the AssemblyAI Universal-3 Pro Streaming model (u3-rt-pro) for speech-to-text — no Python required, no heavy framework dependencies.

Two modes in one repo:

  1. Terminal agent (src/agent.js) — mic input via mic, plays TTS audio in your terminal
  2. Browser server (src/server.js) — Node.js WebSocket server with a browser UI using getUserMedia
Build a Node.js voice agent today

Sign up for a free AssemblyAI account and connect to Universal-3 Pro Streaming from your Node.js app in under 15 minutes.

Start building

Why AssemblyAI Universal-3 Pro for Node.js?

Metric

AssemblyAI Universal-3 Pro

Deepgram Nova-3

P50 latency

307 ms

516 ms

Word Error Rate

8.14%

9.87%

Neural turn detection

❌ (VAD only)

Mid-session prompting

Real-time diarization

Anti-hallucination

AssemblyAI's neural turn detection eliminates the need for a separate VAD library. The model uses both acoustic and linguistic signals to detect when a speaker has finished — not just when they've gone silent.

Quick start

git clone https://github.com/kelseyefoster/voice-agent-nodejs-assemblyai
cd voice-agent-nodejs-assemblyai

npm install
cp .env.example .env
# Edit .env with your API keys


Terminal agent

npm start
# Speak into your mic — Ctrl+C to quit


Browser agent

npm run server
# Open http://localhost:3000

AssemblyAI WebSocket URL

const AAI_WS_URL =
  `wss://streaming.assemblyai.com/v3/ws` +
  `?speech_model=u3-rt-pro` +
  `&encoding=pcm_s16le` +
  `&sample_rate=16000` +
  `&end_of_turn_confidence_threshold=0.4` +
  `&min_end_of_turn_silence_when_confident=300` +
  `&max_turn_silence=1500` +
  `&token=${ASSEMBLYAI_API_KEY}`;


Turn detection

AssemblyAI v3 uses three event types. Handle them like this:

ws.on("message", async (data) => {
  const msg = JSON.parse(data.toString());

  if (msg.type === "Begin") {
    console.log(`Session: ${msg.id}`);
  }

  if (msg.type === "Turn" && !msg.end_of_turn) {
    process.stdout.write(`\r${msg.transcript}`);
  }

  if (msg.type === "Turn" && msg.end_of_turn) {
    const reply = await generateResponse(msg.transcript);
    await speak(reply);
  }
});

Sending audio

Browser (getUserMedia + ScriptProcessor):

processor.onaudioprocess = (e) => {
  const float32 = e.inputBuffer.getChannelData(0);
  const int16 = new Int16Array(float32.length);
  for (let i = 0; i < float32.length; i++) {
    int16[i] = Math.max(-32768, Math.min(32767, Math.round(float32[i] * 32767)));
  }
  ws.send(int16.buffer);
};


Terminal (mic package):

const micStream = micInstance.getAudioStream();
micStream.on("data", (chunk) => {
  aaiWs.send(chunk); // raw PCM s16le bytes
});

Tuning turn detection

Parameter

Default

Lower →

Higher →

end_of_turn_confidence_threshold

0.4

Faster response

Fewer false triggers

min_end_of_turn_silence_when_confident

300ms

Snappier

More natural pauses

max_turn_silence

1500ms

Faster cutoff

More thinking time

Keyterm prompting (mid-session)

Inject domain-specific vocabulary after the session starts without restarting:

ws.send(JSON.stringify({
  type: "UpdateConfiguration",
  keyterms: ["AssemblyAI", "Universal-3", "your-product-name"],
}));

Requirements

  • Node.js 18+
  • npm install installs: ws, openai, elevenlabs, mic, dotenv
  • macOS: afplay (built-in) for audio playback
  • Linux: aplay or mpg123 for audio playback

Deploy to Railway, Render, or Fly.io

# Set environment variables in the platform dashboard, then:
npm run server


The browser server is stateless per-connection — each WebSocket session has its own AssemblyAI connection and conversation history.

Resources

Experiment with real-time turn detection

Try streaming transcription in our Playground and observe how punctuation and silence handling shape turn boundaries in real time.

Open playground
Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
AI voice agents
Universal-3 Pro Streaming