Tutorials

How to do Speech-To-Text with Golang

This article shows how Speech Recognition can be integrated into your Golang application in only a few lines of code.

How to do Speech-To-Text with Golang

Do you want to do Automatic Speech Recognition, or ASR, in Golang, and are wondering how this can be done?

This article shows the different available options and how Speech Recognition can be integrated into your Golang application in 60 seconds.

Speech-to-Text APIs For Golang

Since open source options in Golang are still limited, the best option is to use a Speech-to-Text API, which can be accessed with a simple HTTP client in every programming language.

Choosing the best Speech-to-Text API for your project can be challenging. One of the most easy-to-use APIs to integrate is AssemblyAI. So let's learn how this can be integrated into a Golang application with only a few lines of code.

Speech Recognition in Golang With AssemblyAI

We only have to send three API requests with an HTTP client, but first, we need to get a working API key. You can get one here and get started for free:

Get a free API Key

Step 1: Upload a File

You can skip this step if you already have the file hosted somewhere. Otherwise, we load a local file and send a post request to the upload endpoint.

We can use the built-in net/http package to set up an HTTP client, specify the header data and send the file. Then we can decode the returned JSON data and use the obtained upload_url:

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
)

func main() {
    const API_KEY = "YOUR_API_KEY"
    const UPLOAD_URL = "https://api.assemblyai.com/v2/upload"

    // Load file
    data, err := ioutil.ReadFile("your_file.m4a")
    if err != nil {
        log.Fatalln(err)
    }
    
    defer res.Body.Close()

    // Setup HTTP client and set header
    client := &http.Client{}
    req, _ := http.NewRequest("POST", UPLOAD_URL, bytes.NewBuffer(data))
    req.Header.Set("authorization", API_KEY)
    res, err := client.Do(req)

    if err != nil {
        log.Fatalln(err)
    }

    // Decode json and store it in a map
    var result map[string]interface{}
    json.NewDecoder(res.Body).Decode(&result)

    // Print the upload_url
    fmt.Println(result["upload_url"])
}
https://cdn.assemblyai.com/upload/38174ca3-58b0

Step 2: Start the transcription

Next, we send a POST request to the transcript endpoint. This time we send JSON data that uses the key audio_url together with the upload_url from the previous step:

{"audio_url": AUDIO_URL}

After submitting the request, we will get a response with JSON data that includes an id key:

const AUDIO_URL = "https://cdn.assemblyai.com/upload/38174ca3-58b0"
const TRANSCRIPT_URL = "https://api.assemblyai.com/v2/transcript"

// Prepare json data
values := map[string]string{"audio_url": AUDIO_URL}
jsonData, err := json.Marshal(values)

if err != nil {
    log.Fatalln(err)
}

// Setup HTTP client and set header
client := &http.Client{}
req, _ := http.NewRequest("POST", TRANSCRIPT_URL, bytes.NewBuffer(jsonData))
req.Header.Set("content-type", "application/json")
req.Header.Set("authorization", API_KEY)
res, err := client.Do(req)

if err != nil {
    log.Fatalln(err)
}

defer res.Body.Close()

// Decode json and store it in a map
var result map[string]interface{}
json.NewDecoder(res.Body).Decode(&result)

// Print the id of the transcribed audio
fmt.Println(result["id"])
og6cthhbf0-9195-4b81-8002-bdad76b7bd35

Step 3: Get the transcribed text

And the final task is to get the transcribed text. The steps are similar to before. We compose a new API endpoint with the base url and the newly received transcript id.

Then we have to send a GET request to this endpoint, and afterward, we can again decode the JSON data. If the status is completed, we can then obtain the transcribed text.

// New endpoint
const POLLING_URL = TRANSCRIPT_URL + "/" + "og6cthhbf0-9195-4b81-8002-bdad76b7bd35"

// Send GET request
client := &http.Client{}
req, _ := http.NewRequest("GET", POLLING_URL, nil)
req.Header.Set("content-type", "application/json")
req.Header.Set("authorization", API_KEY)
res, err := client.Do(req)

if err != nil {
    log.Fatalln(err)
}

defer res.Body.Close()

var result map[string]interface{}
json.NewDecoder(res.Body).Decode(&result)

// Check status and print the transcribed text
if result["status"] == "completed" {
    fmt.Println(result["text"])
}
This is your transcribed text of the audio file...

And that's it! You can find the whole code in our GitHub repository.

Open Source Speech Recognition Libraries in Golang

Unfortunately, the options in Golang are still limited. The go-to programming language with the most open source libraries for offline Speech Recognition is Python.

If you still want to use an open source library, you can try PocketSphinx for Golang. It's a lightweight Speech Recognition engine, specifically tuned for handheld and mobile devices. However, this project hasn't been updated in over four years.