Do you want to do Automatic Speech Recognition, or ASR, in Golang, and are wondering how this can be done?
This article shows the different available options and how Speech Recognition can be integrated into your Golang application in 60 seconds.
Speech-to-Text APIs For Golang
Since open source options in Golang are still limited, the best option is to use a Speech-to-Text API, which can be accessed with a simple HTTP client in every programming language.
Choosing the best Speech-to-Text API for your project can be challenging. One of the most easy-to-use APIs to integrate is AssemblyAI. So let's learn how this can be integrated into a Golang application with only a few lines of code.
Speech Recognition in Golang With AssemblyAI
We only have to send three API requests with an HTTP client, but first, we need to get a working API key. You can get one here and get started for free:
Get a free API KeyStep 1: Upload a File
You can skip this step if you already have the file hosted somewhere. Otherwise, we load a local file and send a post request to the upload endpoint.
We can use the built-in net/http
package to set up an HTTP client, specify the header data and send the file. Then we can decode the returned JSON data and use the obtained upload_url
:
package main
import (
"bytes"
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
const API_KEY = "YOUR_API_KEY"
const UPLOAD_URL = "https://api.assemblyai.com/v2/upload"
// Load file
data, err := ioutil.ReadFile("your_file.m4a")
if err != nil {
log.Fatalln(err)
}
defer res.Body.Close()
// Setup HTTP client and set header
client := &http.Client{}
req, _ := http.NewRequest("POST", UPLOAD_URL, bytes.NewBuffer(data))
req.Header.Set("authorization", API_KEY)
res, err := client.Do(req)
if err != nil {
log.Fatalln(err)
}
// Decode json and store it in a map
var result map[string]interface{}
json.NewDecoder(res.Body).Decode(&result)
// Print the upload_url
fmt.Println(result["upload_url"])
}
https://cdn.assemblyai.com/upload/38174ca3-58b0
Step 2: Start the transcription
Next, we send a POST request to the transcript endpoint. This time we send JSON data that uses the key audio_url
together with the upload_url
from the previous step:
{"audio_url": AUDIO_URL}
After submitting the request, we will get a response with JSON data that includes an id
key:
const AUDIO_URL = "https://cdn.assemblyai.com/upload/38174ca3-58b0"
const TRANSCRIPT_URL = "https://api.assemblyai.com/v2/transcript"
// Prepare json data
values := map[string]string{"audio_url": AUDIO_URL}
jsonData, err := json.Marshal(values)
if err != nil {
log.Fatalln(err)
}
// Setup HTTP client and set header
client := &http.Client{}
req, _ := http.NewRequest("POST", TRANSCRIPT_URL, bytes.NewBuffer(jsonData))
req.Header.Set("content-type", "application/json")
req.Header.Set("authorization", API_KEY)
res, err := client.Do(req)
if err != nil {
log.Fatalln(err)
}
defer res.Body.Close()
// Decode json and store it in a map
var result map[string]interface{}
json.NewDecoder(res.Body).Decode(&result)
// Print the id of the transcribed audio
fmt.Println(result["id"])
og6cthhbf0-9195-4b81-8002-bdad76b7bd35
Step 3: Get the transcribed text
And the final task is to get the transcribed text. The steps are similar to before. We compose a new API endpoint with the base url and the newly received transcript id
.
Then we have to send a GET request to this endpoint, and afterward, we can again decode the JSON data. If the status
is completed, we can then obtain the transcribed text.
// New endpoint
const POLLING_URL = TRANSCRIPT_URL + "/" + "og6cthhbf0-9195-4b81-8002-bdad76b7bd35"
// Send GET request
client := &http.Client{}
req, _ := http.NewRequest("GET", POLLING_URL, nil)
req.Header.Set("content-type", "application/json")
req.Header.Set("authorization", API_KEY)
res, err := client.Do(req)
if err != nil {
log.Fatalln(err)
}
defer res.Body.Close()
var result map[string]interface{}
json.NewDecoder(res.Body).Decode(&result)
// Check status and print the transcribed text
if result["status"] == "completed" {
fmt.Println(result["text"])
}
This is your transcribed text of the audio file...
And that's it! You can find the whole code in our GitHub repository.
Open Source Speech Recognition Libraries in Golang
Unfortunately, the options in Golang are still limited. The go-to programming language with the most open source libraries for offline Speech Recognition is Python.
If you still want to use an open source library, you can try PocketSphinx for Golang. It's a lightweight Speech Recognition engine, specifically tuned for handheld and mobile devices. However, this project hasn't been updated in over four years.