Tutorials

Detect scam calls using Go with LeMUR and Twilio

Learn how to detect scam attempts in phone calls, using LeMUR.

Detect scam calls using Go with LeMUR and Twilio

As technology advances, scam attempts are becoming increasingly sophisticated, and harder to identify. Speech AI uses recent advances in speech recognition and generative AI to understand spoken language, and can give us an upper hand against scammers.

 In this tutorial, you’ll build an application that uses Speech AI to transcribe a phone call and identify scam attempts.

By the end of this tutorial, you’ll be able to:

  • Transcribe a phone call in real-time using AssemblyAI and Twilio.
  • Summarize and assess a phone call transcript using LeMUR.

Before you get started

To complete this tutorial, you'll need:

Step 1: Set up ngrok

Twilio will need to access your server through a publicly available URL. In this tutorial, you'll use ngrok to create a publicly available URL for an application running on your local computer.

💡
If you already have a preferred way to expose your application publicly, you may skip this step.
  1. Sign up for an ngrok account.

  2. Install ngrok for your platform.

  3. Authenticate your ngrok agent using Your Authtoken.

    ngrok config add-authtoken <YOUR_TOKEN>
    
  4. Open an ngrok tunnel for port 8080. ngrok will only tunnel connections while the following command is running.

    ngrok http 8080
    

You'll see something similar to the output below, where the URL next to Forwarding is the publicly available URL that forwards to your local 8080 port (https://84c5df474.ngrok-free.dev in the example output).

ngrok                                                                   (Ctrl+C to quit)

Session Status                online
Account                       inconshreveable (Plan: Free)
Version                       3.0.0
Region                        United States (us)
Latency                       78ms
Web Interface                 http://127.0.0.1:4040
Forwarding                    https://84c5df474.ngrok-free.dev -> http://localhost:8080

Connections                   ttl     opn     rt1     rt5     p50     p90
                              0       0       0.00    0.00    0.00    0.00

Sample output for ngrok http 8080

Copy the Forwarding URL in your terminal output and save it for the next step.

Step 2: Set up Twilio

You'll need to register a phone number with Twilio and configure it to call your server application whenever someone calls that number. You can also use the Twilio console to update the voice URL for your phone number.

  1. Sign up for a Twilio account.

  2. Download Twilio CLI.

  3. In a new terminal, log in using Twilio CLI. You'll be asked to enter an identifier for your new profile, for example dev.

    twilio login
    
  4. Select the profile you created.

    twilio profiles:use <YOUR_PROFILE_ID>
    
  5. Update the voice URL for your phone number.

    twilio phone-number:update <YOUR_TWILIO_NUMBER> --voice-url <YOUR_NGROK_URL>
    
💡
You'll find the Account SID, Auth Token, and phone number under Account info on your Twilio console.

Now, when someone calls your phone number, they'll be forwarded to port 8080 on your local computer. Not having to deploy every change to a cloud instance is going to speed up the development process.

Next up, you'll build the server application to handle the phone call.

Step 3: Transcribe phone calls in real-time

Before we can identify whether a caller is a scammer, we first need to identify what they’re saying. In this step, you’ll build a Go server application that transcribes Twilio phone calls in real-time.

This step uses the application we built in Transcribe phone calls in real-time in Go with Twilio and AssemblyAI. If you want to understand the following code in more detail, refer to the original post.

  1. Create and navigate into a new project directory.

    mkdir scam-screener-go
    cd scam-screener-go
    
  2. Initialize your Go module.

    go mod init scam-screener-go
    
  3. Install the AssemblyAI Go SDK.

    go get github.com/AssemblyAI/assemblyai-go-sdk
    
  4. Install the WebSocket module by Quinn Rivenwell. You'll need this to handle the incoming Twilio MediaStream connection.

    go get nhooyr.io/websocket
    
  5. Create a new file called main.go with the following code. The starter code uses the finished example from Transcribe phone calls in real-time in Go with Twilio and AssemblyAI to transcribe a Twilio phone call.

    package main
    
    import (
        "context"
        "fmt"
        "log"
        "net/http"
        "os"
    
        "nhooyr.io/websocket"
        "nhooyr.io/websocket/wsjson"
    
        aai "github.com/AssemblyAI/assemblyai-go-sdk"
    )
    
    type realtimeTranscriber struct {
    }
    
    func (t *realtimeTranscriber) SessionBegins(ev aai.SessionBegins) {
        log.Println("session begins")
    }
    
    func (t *realtimeTranscriber) SessionTerminated(ev aai.SessionTerminated) {
        log.Println("session terminated")
    }
    
    func (t *realtimeTranscriber) FinalTranscript(transcript aai.FinalTranscript) {
        fmt.Printf("%s\r\n", transcript.Text)
    }
    
    func (t *realtimeTranscriber) PartialTranscript(transcript aai.PartialTranscript) {
        // Ignore silence.
        if transcript.Text == "" {
            return
        }
    
        fmt.Printf("%s\r", transcript.Text)
    }
    
    func (t *realtimeTranscriber) Error(err error) {
        log.Println("something bad happened:", err)
    }
    
    var apiKey = os.Getenv("ASSEMBLYAI_API_KEY")
    
    func main() {
        http.HandleFunc("/", twilio)
        http.HandleFunc("/media", media)
    
        log.Println("Server is running on port 8080")
    
        if err := http.ListenAndServe(":8080", nil); err != nil {
            log.Fatal(err)
        }
    }
    
    func twilio(w http.ResponseWriter, r *http.Request) {
        if r.Method != "POST" {
            w.WriteHeader(http.StatusMethodNotAllowed)
            return
        }
    
        twiML := `<?xml version="1.0" encoding="UTF-8"?>
    <Response>
       <Say>
          Speak to see your audio transcribed in the console.
       </Say>
       <Connect>
          <Stream url='%s' />
       </Connect>
    </Response>`
    
        w.Header().Add("Content-Type", "application/xml")
        fmt.Fprintln(w, fmt.Sprintf(twiML, "wss://"+r.Host+"/media"))
    }
    
    type TwilioMessage struct {
        Event string `json:"event"`
        Media struct {
            // Contains audio samples.
            Payload []byte `json:"payload"`
        } `json:"media"`
    }
    
    func media(w http.ResponseWriter, r *http.Request) {
        // Upgrade HTTP request to WebSocket.
        c, err := websocket.Accept(w, r, nil)
        if err != nil {
            log.Println("unable to upgrade connection to websocket:", err)
            w.WriteHeader(http.StatusInternalServerError)
            return
        }
        defer c.CloseNow()
    
        ctx, cancel := context.WithCancel(r.Context())
        defer cancel()
    
        handler := new(realtimeTranscriber)
    
        client := aai.NewRealTimeClientWithOptions(
            aai.WithRealTimeAPIKey(apiKey),
    
            // Twilio MediaStream sends audio in mu-law format.
            aai.WithRealTimeEncoding(aai.RealTimeEncodingPCMMulaw),
    
            // Twilio MediaStream sends audio at 8000 samples per second.
            aai.WithRealTimeSampleRate(8000),
    
            aai.WithHandler(handler),
        )
    
        if err := client.Connect(ctx); err != nil {
            log.Println("unable to connect to real-time transcription:", err)
            c.Close(websocket.StatusInternalError, err.Error())
            return
        }
    
        log.Println("connected to real-time transcription")
    
        for {
            var message TwilioMessage
            err = wsjson.Read(ctx, c, &message)
            if err != nil {
                log.Println("unable to read twilio message:", err)
                c.Close(websocket.StatusInternalError, err.Error())
                return
            }
    
            switch message.Event {
            case "connected":
                log.Println("twilio mediastream connected")
            case "start":
                log.Println("twilio mediastream started")
            case "media":
                if err := client.Send(ctx, message.Media.Payload); err != nil {
                    log.Println("unable to send audio for real-time transcription:", err)
                    c.Close(websocket.StatusInternalError, err.Error())
                    return
                }
            case "stop":
                log.Println("twilio mediastream stopped")
    
                if err := client.Disconnect(ctx, true); err != nil {
                    log.Println("unable to disconnect from real-time transcription:", err)
                    c.Close(websocket.StatusInternalError, err.Error())
                    return
                }
    
                log.Println("disconnected from real-time transcription")
                c.Close(websocket.StatusNormalClosure, "")
    
                return
            }
        }
    }
    

Next, you'll inspect the full transcript from the phone call to determine whether it's a scam call or not.

Step 4: Store the full transcript

Real-Time Transcription returns a final transcript once it detects the end of an utterance. Utterances are continuous pieces of speech that are separated by silence. Since a phone call may contain multiple utterances, you need to store all final transcripts for later.

  1. Add a field to the realtimeTranscriber that will store all final transcripts during the phone call.

    type realtimeTranscriber struct {
        finalTranscripts []string
    }
    
  2. In the FinalTranscript method of the realtimeTranscriber, append each transcript to the field:

    func (t *realtimeTranscriber) FinalTranscript(transcript aai.FinalTranscript) {
        fmt.Printf("%s\r\n", transcript.Text)
    
        t.finalTranscripts = append(t.finalTranscripts, transcript.Text)
    }
    

Once the phone call ends, finalTranscripts will contain the full transcript for the phone call.

Step 5: Summarize the full transcript using LeMUR

LeMUR is a framework by AssemblyAI that makes it easy to use the power of LLMs with voice data. In this step, you’ll use LeMUR to create a summary of the phone call along with an assessment of whether it’s a scam call or not. 

  1. Create a new const called prompt and write the prompt.

    const prompt = `Provide a summary of the call, followed by a brief assessment of whether it's a scam call.
    
    <context>
    You're a personal assistant who's helping an elderly person protect themselves from scammers.
    </context>
    
    <answer_format>
    One-sentence summary in second person. Do not provide a preamble.
    </answer_format>`
    
  2. Create a new function called summarize with the following content. The function uses LeMUR to create a custom summary using an LLM.

    func summarize(ctx context.Context, transcript string) (string, error) {
        c := aai.NewClient(apiKey)
    
        // LeMUR generates a summary and assessment of the transcript using the
        // provided prompt.
        resp, err := c.LeMUR.Task(ctx, aai.LeMURTaskParams{
            LeMURBaseParams: aai.LeMURBaseParams{
                InputText: aai.String(transcript),
            },
            Prompt: aai.String(prompt),
        })
        if err != nil {
            return "", err
        }
    
        return *resp.Response, nil
    }
    
  3. Summarize the transcript after the call has ended.

    case "stop":
        log.Println("twilio mediastream stopped")
    
        if err := client.Disconnect(ctx, true); err != nil {
            log.Println("unable to disconnect from real-time transcription:", err)
            c.Close(websocket.StatusInternalError, err.Error())
            return
        }
    
        log.Println("disconnected from real-time transcription")
        c.Close(websocket.StatusNormalClosure, "")
    
        summary, err := summarize(ctx, strings.Join(handler.finalTranscripts, "\n"))
        if err != nil {
            log.Println("unable to summarize call:", err)
            return
        }
    
        log.Println("Summary:", summary)
    
        return
    
  4. Start the server to accept calls.

    go run main.go
    

Call the Twilio phone number and leave a voice mail, and hang up. In your terminal, you'll see a similar output to the following:

Server is running on port 8080
session begins
connected to real-time transcription
twilio mediastream connected
twilio mediastream started
Hi. My name is John. I'm excited to share with you an exclusive offer that will slice your electric bill in half. Call me back.
twilio mediastream stopped
session terminated
disconnected from real-time transcription
Summary: 
 <text>
You received a call from someone named John promising to cut your electric bill in half if you call him back. This is likely a scam.
</text>

Terminal output

Learn more

In this tutorial, you’ve learned how to transcribe a phone call, and use LLMs to identify whether it’s a scam attempt. Try and see if you can avoid detection. And if you did, can you think of ways to improve the detection accuracy? Check out Improving your prompt for more inspiration.

You can also experiment with using different LLMs by setting FinalModel on the LeMURBaseParams. For example, by using the faster and cheaper `basic` model, you could prompt LeMUR incrementally after each utterance, to provide a live assessment during the call. For more information on supported LLMs, refer to Change the model type in our docs.

To learn more, check out the following resources:

To keep up with more content like this, make sure you subscribe to our newsletter and join our Discord server.

Complete code example

package main

import (
	"context"
	"fmt"
	"log"
	"net/http"
	"os"
	"strings"

	"nhooyr.io/websocket"
	"nhooyr.io/websocket/wsjson"

	aai "github.com/AssemblyAI/assemblyai-go-sdk"
)

type realtimeTranscriber struct {
	finalTranscripts []string
}

func (t *realtimeTranscriber) SessionBegins(ev aai.SessionBegins) {
	log.Println("session begins")
}

func (t *realtimeTranscriber) SessionTerminated(ev aai.SessionTerminated) {
	log.Println("session terminated")
}

func (t *realtimeTranscriber) FinalTranscript(transcript aai.FinalTranscript) {
	fmt.Printf("%s\r\n", transcript.Text)

	t.finalTranscripts = append(t.finalTranscripts, transcript.Text)
}

func (t *realtimeTranscriber) PartialTranscript(transcript aai.PartialTranscript) {
	// Ignore silence.
	if transcript.Text == "" {
		return
	}

	fmt.Printf("%s\r", transcript.Text)
}

func (t *realtimeTranscriber) Error(err error) {
	log.Println("something bad happened:", err)
}

var apiKey = os.Getenv("ASSEMBLYAI_API_KEY")

func main() {
	http.HandleFunc("/", twilio)
	http.HandleFunc("/media", media)

	log.Println("Server is running on port 8080")

	if err := http.ListenAndServe(":8080", nil); err != nil {
		log.Fatal(err)
	}
}

func twilio(w http.ResponseWriter, r *http.Request) {
	if r.Method != "POST" {
		w.WriteHeader(http.StatusMethodNotAllowed)
		return
	}

	twiML := `<?xml version="1.0" encoding="UTF-8"?>
   <Response>
      <Say>
         Speak to see your audio transcribed in the console.
      </Say>
      <Connect>
         <Stream url='%s' />
      </Connect>
   </Response>`

	w.Header().Add("Content-Type", "application/xml")
	fmt.Fprintln(w, fmt.Sprintf(twiML, "wss://"+r.Host+"/media"))
}

type TwilioMessage struct {
	Event string `json:"event"`
	Media struct {
		// Contains audio samples.
		Payload []byte `json:"payload"`
	} `json:"media"`
}

func media(w http.ResponseWriter, r *http.Request) {
	// Upgrade HTTP request to WebSocket.
	c, err := websocket.Accept(w, r, nil)
	if err != nil {
		log.Println("unable to upgrade connection to websocket:", err)
		w.WriteHeader(http.StatusInternalServerError)
		return
	}
	defer c.CloseNow()

	ctx, cancel := context.WithCancel(r.Context())
	defer cancel()

	handler := new(realtimeTranscriber)

	client := aai.NewRealTimeClientWithOptions(
		aai.WithRealTimeAPIKey(apiKey),

		// Twilio MediaStream sends audio in mu-law format.
		aai.WithRealTimeEncoding(aai.RealTimeEncodingPCMMulaw),

		// Twilio MediaStream sends audio at 8000 samples per second.
		aai.WithRealTimeSampleRate(8000),

		aai.WithHandler(handler),
	)

	if err := client.Connect(ctx); err != nil {
		log.Println("unable to connect to real-time transcription:", err)
		c.Close(websocket.StatusInternalError, err.Error())
		return
	}

	log.Println("connected to real-time transcription")

	for {
		var message TwilioMessage
		err = wsjson.Read(ctx, c, &message)
		if err != nil {
			log.Println("unable to read twilio message:", err)
			c.Close(websocket.StatusInternalError, err.Error())
			return
		}

		switch message.Event {
		case "connected":
			log.Println("twilio mediastream connected")
		case "start":
			log.Println("twilio mediastream started")
		case "media":
			if err := client.Send(ctx, message.Media.Payload); err != nil {
				log.Println("unable to send audio for real-time transcription:", err)
				c.Close(websocket.StatusInternalError, err.Error())
				return
			}
		case "stop":
			log.Println("twilio mediastream stopped")

			if err := client.Disconnect(ctx, true); err != nil {
				log.Println("unable to disconnect from real-time transcription:", err)
				c.Close(websocket.StatusInternalError, err.Error())
				return
			}

			log.Println("disconnected from real-time transcription")
			c.Close(websocket.StatusNormalClosure, "")

			summary, err := summarize(ctx, strings.Join(handler.finalTranscripts, "\n"))
			if err != nil {
				log.Println("unable to summarize call:", err)
				return
			}

			log.Println("Summary:", summary)

			return
		}
	}
}

const prompt = `Provide a summary of the call, followed by a brief assessment of whether it's a scam call.

<context>
You're a personal assistant who's helping an elderly person protect themselves from scammers.
</context>

<answer_format>
One-sentence summary in second person. Do not provide a preamble.
</answer_format>`

func summarize(ctx context.Context, transcript string) (string, error) {
	c := aai.NewClient(apiKey)

	// LeMUR generates a summary and assessment of the transcript using the
	// provided prompt.
	resp, err := c.LeMUR.Task(ctx, aai.LeMURTaskParams{
		LeMURBaseParams: aai.LeMURBaseParams{
			InputText: aai.String(transcript),
		},
		Prompt: aai.String(prompt),
	})
	if err != nil {
		return "", err
	}

	return *resp.Response, nil
}