Node.js voice agent with AssemblyAI Universal-3 Pro Streaming
Build a real-time voice agent in Node.js using the AssemblyAI Universal-3 Pro Streaming model



Build a real-time voice agent in Node.js using the AssemblyAI Universal-3 Pro Streaming model (u3-rt-pro) for speech-to-text — no Python required, no heavy framework dependencies.
Two modes in one repo:
- Terminal agent (
src/agent.js) — mic input viamic, plays TTS audio in your terminal - Browser server (
src/server.js) — Node.js WebSocket server with a browser UI usinggetUserMedia
Why AssemblyAI Universal-3 Pro for Node.js?
AssemblyAI's neural turn detection eliminates the need for a separate VAD library. The model uses both acoustic and linguistic signals to detect when a speaker has finished — not just when they've gone silent.
Quick start
git clone https://github.com/kelseyefoster/voice-agent-nodejs-assemblyai
cd voice-agent-nodejs-assemblyai
npm install
cp .env.example .env
# Edit .env with your API keys
Terminal agent
npm start
# Speak into your mic — Ctrl+C to quit
Browser agent
npm run server
# Open http://localhost:3000AssemblyAI WebSocket URL
const AAI_WS_URL =
`wss://streaming.assemblyai.com/v3/ws` +
`?speech_model=u3-rt-pro` +
`&encoding=pcm_s16le` +
`&sample_rate=16000` +
`&end_of_turn_confidence_threshold=0.4` +
`&min_end_of_turn_silence_when_confident=300` +
`&max_turn_silence=1500` +
`&token=${ASSEMBLYAI_API_KEY}`;
Turn detection
AssemblyAI v3 uses three event types. Handle them like this:
ws.on("message", async (data) => {
const msg = JSON.parse(data.toString());
if (msg.type === "Begin") {
console.log(`Session: ${msg.id}`);
}
if (msg.type === "Turn" && !msg.end_of_turn) {
process.stdout.write(`\r${msg.transcript}`);
}
if (msg.type === "Turn" && msg.end_of_turn) {
const reply = await generateResponse(msg.transcript);
await speak(reply);
}
});
Sending audio
Browser (getUserMedia + ScriptProcessor):
processor.onaudioprocess = (e) => {
const float32 = e.inputBuffer.getChannelData(0);
const int16 = new Int16Array(float32.length);
for (let i = 0; i < float32.length; i++) {
int16[i] = Math.max(-32768, Math.min(32767, Math.round(float32[i] * 32767)));
}
ws.send(int16.buffer);
};
Terminal (mic package):
const micStream = micInstance.getAudioStream();
micStream.on("data", (chunk) => {
aaiWs.send(chunk); // raw PCM s16le bytes
});Tuning turn detection
Keyterm prompting (mid-session)
Inject domain-specific vocabulary after the session starts without restarting:
ws.send(JSON.stringify({
type: "UpdateConfiguration",
keyterms: ["AssemblyAI", "Universal-3", "your-product-name"],
}));Requirements
- Node.js 18+
npm installinstalls:ws,openai,elevenlabs,mic,dotenv- macOS:
afplay(built-in) for audio playback - Linux:
aplayormpg123for audio playback
Deploy to Railway, Render, or Fly.io
# Set environment variables in the platform dashboard, then:
npm run server
The browser server is stateless per-connection — each WebSocket session has its own AssemblyAI connection and conversation history.
Resources
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.