Skip to main content

Transcribe streaming audio from a microphone in Go

Learn how to transcribe streaming audio in Go.

Overview

By the end of this tutorial, you'll be able to transcribe audio from your microphone in Go.

Supported languages

Streaming Speech-to-Text is only available for English. See Supported languages.

Before you begin

To complete this tutorial, you need:

  • Go installed.
  • An with a credit card set up.

You can download the full sample code from GitHub.

Step 1: Install dependencies

  1. 1

    PortAudio is a cross-platform library for streaming audio. The Go SDK uses PortAudio to stream audio from your microphone.

  2. 2

    Install the AssemblyAI Go module using go get.

    go get github.com/AssemblyAI/assemblyai-go-sdk

Step 2: Create a transcriber

In this step, you'll define a transcriber to handle real-time events.

  1. 1

    Create a new file called main.go that imports the AssemblyAI Go module.

    package main

    import (
    aai "github.com/AssemblyAI/assemblyai-go-sdk"
    )
  2. 2

    Create a type that implements RealtimeHandler.

    type realtimeTranscriber struct{}

    func (h *realtimeTranscriber) SessionBegins(event aai.SessionBegins) {
    fmt.Println("session begins")
    }

    func (h *realtimeTranscriber) SessionTerminated(event aai.SessionTerminated) {
    fmt.Println("session terminated")
    }

    func (h *realtimeTranscriber) FinalTranscript(transcript aai.FinalTranscript) {
    fmt.Println(transcript.Text)
    }

    func (h *realtimeTranscriber) PartialTranscript(transcript aai.PartialTranscript) {
    fmt.Printf("%s\r", transcript.Text)
    }

    func (h *realtimeTranscriber) Error(err error) {
    fmt.Println(err)
    }
  3. 3

    Browse to , and then click Copy API key under Copy your API key.

  4. 4

    Create a new RealTimeClient using the function you created. Replace YOUR_API_KEY with your copied API key.

    func main() {
    var transcriber realtimeTranscriber

    client := aai.NewRealTimeClient("YOUR_API_KEY" , &transcriber)
    }
    Sample rate

    Sample rate is the number of audio samples per second, measured in hertz (Hz). Higher sample rates result in higher quality audio, which may lead to better transcripts, but also more data being sent over the network.

    By default, the SDK uses a sample rate of 16 kHz. To specify your own sample rate, use NewRealTimeClientWithOptions instead of NewRealTimeClient:

    client := aai.NewRealTimeClientWithOptions(
    aai.WithRealTimeAPIKey("YOUR_API_KEY" )
    aai.WithHandler(&transcriber),
    aai.WithSampleRate(16_000),
    )

    We recommend the following sample rates:

    • Minimum quality: 8_000 (8 kHz)
    • Medium quality: 16_000 (16 kHz)
    • Maximum quality: 48_000 (48 kHz)

Step 3: Connect the transcriber

To stream audio to AssemblyAI, you first need to establish a connection to the API using client.Connect().

ctx := context.Background()

if err := client.Connect(ctx); err != nil {
logger.Fatal(err)
}

You've set up the transcriber to handle real-time events, and connected it to the API. Next, you'll create a recorder to record audio from your microphone.

Step 4: Record audio from microphone

In this step, you'll configure your Go app to record audio from your microphone. You'll use the gordonklaus/portaudio module to make this easier.

  1. 1

    Install the portaudio module for Go.

    go get github.com/gordonklaus/portaudio
  2. 2

    Create a new file called recorder.go with the following code:

    package main

    import (
    "bytes"
    "encoding/binary"

    "github.com/gordonklaus/portaudio"
    )

    type recorder struct {
    stream *portaudio.Stream
    in []int16
    }

    func newRecorder(sampleRate int, framesPerBuffer int) (*recorder, error) {
    in := make([]int16, framesPerBuffer)

    stream, err := portaudio.OpenDefaultStream(1, 0, float64(sampleRate), framesPerBuffer, in)
    if err != nil {
    return nil, err
    }

    return &recorder{
    stream: stream,
    in: in,
    }, nil
    }

    func (r *recorder) Read() ([]byte, error) {
    if err := r.stream.Read(); err != nil {
    return nil, err
    }

    // Little-endian is the most common byte order, but you may need to change to binary.BigEndian depending on your computer.
    byteOrder := binary.LittleEndian

    buf := new(bytes.Buffer)

    if err := binary.Write(buf, byteOrder, r.in); err != nil {
    return nil, err
    }

    return buf.Bytes(), nil
    }

    func (r *recorder) Start() error {
    return r.stream.Start()
    }

    func (r *recorder) Stop() error {
    return r.stream.Stop()
    }

    func (r *recorder) Close() error {
    return r.stream.Close()
    }

  3. 3

    In main.go, open a microphone stream. The sampleRate must be the same value as the one you passed to RealTimeClient (16_000 by default).

    portaudio.Initialize()
    defer portaudio.Terminate()

    var (
    // Must match the sample rate you used for the transcriber.
    sampleRate = 16000

    // Determines how many audio samples to send at once (3200 / 16000 = 200 ms).
    framesPerBuffer = 3200
    )

    rec, err := newRecorder(sampleRate, framesPerBuffer)
    if err != nil {
    log.Fatal(err)
    }

    if err := rec.Start(); err != nil {
    log.Fatal(err)
    }
    Audio data format

    The recorder formats the audio data for you. If you want to stream data from elsewhere, make sure that your audio data is in the following format:

    • Single channel
    • 16-bit signed integer PCM or mu-law encoding
  4. 4

    Read data from the microphone stream, and send it to AssemblyAI for transcription using client.Send().

    for {
    select {
    default:
    // Read audio samples from the microphone.
    b, err := rec.Read()
    if err != nil {
    logger.Fatal(err)
    }

    // Send partial audio samples.
    if err := client.Send(ctx, b); err != nil {
    logger.Fatal(err)
    }
    }
    }

Step 5: Disconnect the transcriber

In this step, you'll clean up resources by stopping the recorder and disconnecting the transcriber.

To disconnect the transcriber on Ctrl+C, use client.Disconnect(). Disconnect() accepts a boolean parameter that allows you to wait for any remaining transcriptions before closing the connection.

sigs := make(chan os.Signal, 1)

signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)

for {
select {
case <-sigs:
// Stop recording.
if err := rec.Stop(); err != nil {
log.Fatal(err)
}

// Disconnect the transcriber.
if err := client.Disconnect(ctx, true); err != nil {
log.Fatal(err)
}

os.Exit(0)
default:
// Read audio samples from the microphone.
b, err := rec.Read()
if err != nil {
logger.Fatal(err)
}

// Send partial audio samples.
if err := client.Send(ctx, b); err != nil {
logger.Fatal(err)
}
}
}
}

Run your application to start transcribing. Your OS may require you to allow your app to access your microphone. If prompted, click Allow.

You can also find the source code for this tutorial on GitHub.

Need some help?

If you get stuck, or have any other questions, we'd love to help you out. Ask our support team in our Discord server.