Custom Formatting | AssemblyAI

Supported languages

Global Englishen

Australian Englishen_au

British Englishen_uk

US Englishen_us

Spanishes

Frenchfr

Germande

Italianit

Portuguesept

Dutchnl

Hindihi

Japaneseja

Chinesezh

Finnishfi

Koreanko

Polishpl

Russianru

Turkishtr

Ukrainianuk

Vietnamesevi

Afrikaansaf

Albaniansq

Amharicam

Arabicar

Armenianhy

Assameseas

Azerbaijaniaz

Bashkirba

Basqueeu

Belarusianbe

Bengalibn

Bosnianbs

Bretonbr

Bulgarianbg

Catalanca

Croatianhr

Czechcs

Danishda

Estonianet

Faroesefo

Galiciangl

Georgianka

Greekel

Gujaratigu

Haitianht

Hausaha

Hawaiianhaw

Hebrewhe

Hungarianhu

Icelandicis

Indonesianid

Javanesejw

Kannadakn

Kazakhkk

Laolo

Latinla

Latvianlv

Lingalaln

Lithuanianlt

Luxembourgishlb

Macedonianmk

Malagasymg

Malayms

Malayalamml

Maltesemt

Maorimi

Marathimr

Mongolianmn

Nepaline

Norwegianno

Norwegian Nynorsknn

Occitanoc

Panjabipa

Pashtops

Persianfa

Romanianro

Sanskritsa

Serbiansr

Shonasn

Sindhisd

Sinhalasi

Slovaksk

Sloveniansl

Somaliso

Sundanesesu

Swahilisw

Swedishsv

Tagalogtl

Tajiktg

Tamilta

Tatartt

Telugute

Turkmentk

Urduur

Uzbekuz

Welshcy

Yiddishyi

Yorubayo

Supported models

Universal-3-Prouniversal-3-pro

Universal-2universal-2

Supported regions

US only

Overview

The Custom Formatting feature automatically standardizes and formats specific types of information in your transcripts, ensuring consistency across dates, phone numbers, emails, and other data types. This eliminates the need for post-processing and provides clean, formatted output ready for your application.

Key capabilities:

Format dates in your preferred style (US, European, ISO, etc.)
Standardize phone number formats with custom patterns
Control currency and decimal precision
Convert spelled-out text into formatted patterns
Format URLs as hyperlinks
Apply multiple formatting rules simultaneously

Common use cases:

Standardizing contact information in customer service transcripts
Formatting financial data in earnings calls
Preparing transcripts for CRM systems with specific format requirements
Creating consistent documentation from meetings
Processing legal or medical transcripts with strict formatting standards

Quickstart

There are two ways to use Custom Formatting:

Transcribe and format in one request - Best when you’re starting a new transcription and want to automatically format the transcript text as part of that process
Transcribe and format in separate requests - Best when you already have text that you would like to format or for more complicated workflows where you want to separate the transcription and formatting tasks

Method 1: Transcribe and format in one request

This method is ideal when you’re starting fresh and want both transcription and formatting in a single workflow.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 
6 headers = {
7   "authorization": "<YOUR_API_KEY>"
8 }
9 
10 # Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11 audio_url = "https://assembly.ai/phone-msg.m4a"
12 
13 # Configure transcription with custom formatting
14 data = {
15   "audio_url": audio_url,
16   "speech_models": ["universal-3-pro", "universal-2"],
17   "language_detection": True,
18   "speaker_labels": True,
19   "speech_understanding": {
20     "request": {
21       "custom_formatting": {
22         "date": "mm/dd/yyyy",
23         "phone_number": "(xxx)xxx-xxxx",
24         "email": "username@domain.com",
25         "format_utterances": True
26       }
27     }
28   }
29 }
30 
31 # Submit transcription request
32 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
33 transcript_id = response.json()["id"]
34 polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
35 
36 # Poll for transcription results
37 while True:
38   transcript = requests.get(polling_endpoint, headers=headers).json()
39 
40   if transcript["status"] == "completed":
41     break
42 
43   elif transcript["status"] == "error":
44     raise RuntimeError(f"Transcription failed: {transcript['error']}")
45 
46   else:
47     time.sleep(3)
48 
49 # Access and display results
50 print("\n--- Formatting Details ---")
51 mapping = transcript['speech_understanding']['response']['custom_formatting']['mapping']
52 for original, formatted in mapping.items():
53   print(f"Original: {original}")
54   print(f"Formatted: {formatted}\n")

Method 2: Transcribe and format in separate requests

This method is useful when you already have text that you would like to format or for more complicated workflows where you want to separate the transcription and formatting tasks.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 
6 headers = {
7   "authorization": "<YOUR_API_KEY>"
8 }
9 
10 # Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11 audio_url = "https://assembly.ai/phone-msg.m4a"
12 
13 # Submit transcription request (without formatting)
14 data = {
15   "audio_url": audio_url,
16   "speech_models": ["universal-3-pro", "universal-2"],
17   "language_detection": True,
18   "speaker_labels": True
19 }
20 
21 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
22 transcript_id = response.json()["id"]
23 polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
24 
25 # Poll for transcription completion
26 while True:
27   transcript = requests.get(polling_endpoint, headers=headers).json()
28 
29   if transcript["status"] == "completed":
30     print("Transcription completed!")
31     break
32 
33   elif transcript["status"] == "error":
34     raise RuntimeError(f"Transcription failed: {transcript['error']}")
35 
36   else:
37     time.sleep(3)
38 
39 # Add custom formatting configuration to the completed transcript
40 understanding_body = {
41   "transcript_id": transcript_id,
42   "speech_understanding": {
43     "request": {
44       "custom_formatting": {
45       "date": "mm/dd/yyyy",
46       "phone_number": "(xxx)xxx-xxxx",
47       "email": "username@domain.com",
48       "format_utterances": True
49       }
50     }
51   }
52 }
53 
54 # Send to Speech Understanding API for formatting
55 result = requests.post(
56   "https://llm-gateway.assemblyai.com/v1/understanding",
57   headers=headers,
58   json=understanding_body
59 ).json()
60 
61 print("Formatting completed!")
62 
63 # Access and display results
64 print("\n--- Formatting Details ---")
65 mapping = result['speech_understanding']['response']['custom_formatting']['mapping']
66 for original, formatted in mapping.items():
67   print(f"Original: {original}")
68   print(f"Formatted: {formatted}\n")

Expected output:

--- Formatting Details ---
Original: Yes, I would appreciate it if you could call me back. My phone number is 555-679-3466. Also, my cell phone number is 555-679-8244. Once again, if you could call me back, I'd appreciate it. My phone number is 555-679-3466. Thanks.
Formatted: Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466. Also, my cell phone number is (555)679-8244. Once again, if you could call me back, I'd appreciate it. My phone number is (555)679-3466. Thanks.

Output format

Data from the Custom Formatting API will be returned in the custom_formatted object, which is contained in the speech_understanding object. The formatted_text key will included a formatted version of the transcript text.

If Speaker Diarization is used in the request a formatted_utterances key will be returned containing formatted utterances with preserved timestamps.

Example response structure:

1 {
2   "id": "2accd7f2-445b-4d08-b10b-1bafdd5906ed",
3   "status": "completed",
4   "text": "Yes, I would appreciate it if you could call me back. My phone number is 555-679-3466...",
5   "speech_understanding": {
6     "request": {
7       "custom_formatting": {
8         "date": "mm/dd/yyyy",
9         "phone_number": "(xxx)xxx-xxxx",
10         "email": "username@domain.com",
11         "format_utterances": true
12       }
13     },
14     "response": {
15       "custom_formatting": {
16         "formatted_text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...",
17         "formatted_utterances": [
18           {
19             "confidence": 0.9920061471354167,
20             "end": 26000,
21             "speaker": "A",
22             "start": 1920,
23             "text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...",
24             "words": [
25               {
26                 "speaker": "A",
27                 "start": 1920,
28                 "end": 2160,
29                 "text": "Yes,",
30                 "confidence": 0.808349609375
31               }
32               // ... more words
33             ]
34           }
35         ],
36         "mapping": {
37           "555-679-3466": "(555)679-3466",
38           "555-679-8244": "(555)679-8244"
39         },
40         "status": "success"
41       }
42     }
43   }
44 }

Key features of the output:

Formatted Text: Formatted text can be found in the formatted_text key
Formatted utterances: When format_utterances is enabled, speaker-separated segments in the formatted_utterances key include formatted text
Preserved timestamps: All word-level timestamps in formatted_utterances remain intact after formatting, allowing you to maintain temporal alignment with the audio
Mapping object: Shows exactly what transformations were applied (original → formatted)

Understanding the `custom_formatting` parameter

The custom_formatting parameter accepts an object with specific formatting rules for different data types in your transcript. Each property in the object defines how a particular type of information should be formatted.

Available formatting options

Parameter	Type	Description	Example Values
`date`	string	Specifies the format pattern for dates in the transcript	`"mm/dd/yyyy"`, `"dd/mm/yyyy"`, `"yyyy-mm-dd"`
`phone_number`	string	Specifies the format pattern for phone numbers	`"(xxx)xxx-xxxx"`, `"xxx-xxx-xxxx"`, `"xxx.xxx.xxxx"`
`email`	string	Specifies the format pattern for email addresses	`"username@domain.com"`, `"firstname.lastname@domain.com"`
`format_utterances`	boolean	When true, applies formatting to utterances in addition to the main text field. Preserves all word-level timestamps.	`true`, `false` (default: `false`)

Example configuration:

1 {
2   "custom_formatting": {
3     "date": "mm/dd/yyyy",
4     "phone_number": "(xxx)xxx-xxxx",
5     "email": "username@domain.com",
6     "format_utterances": true
7   }
8 }

When you include this configuration in your transcription request, the API will automatically detect and format dates, phone numbers, and emails in your transcript according to the specified patterns. With format_utterances enabled, the formatting is applied to both the main transcript text and individual speaker utterances while preserving all timing information.

Common formatting patterns

Date formats

Pattern	Example Output	Description
`mm/dd/yyyy`	09/19/1991	US format (month/day/year)
`dd/mm/yyyy`	19/09/1991	European format (day/month/year)
`yyyy-mm-dd`	1991-09-19	ISO 8601 format
`mm-dd-yyyy`	09-19-1991	US format with dashes
`dd.mm.yyyy`	19.09.1991	European format with dots

Phone number formats

Pattern	Example Output	Description
`(xxx)xxx-xxxx`	(555)679-3466	Parentheses and dash
`xxx-xxx-xxxx`	555-679-3466	Dashes only
`xxx.xxx.xxxx`	555.679.3466	Dots separator
`+x(xxx)xxx-xxxx`	+1(555)679-3466	International format

Email formats

Pattern	Example Output
`username@domain.com`	john.doe@example.com
`firstname.lastname@domain.com`	john.doe@company.com

Best practices

Choose appropriate formats: Select formatting patterns that match your application’s requirements and regional standards.
Combine formatting rules: You can apply multiple formatting rules simultaneously for comprehensive text standardization.
Test with sample data: Verify your formatting patterns work correctly with representative audio samples before processing large batches.
Review the mapping: Check the mapping object in the response to see exactly what was changed and verify the results.
Consider regional differences: Be mindful of date and phone number format differences when processing international content.

API reference

Request

Method 1: Transcribe and format in one request

When creating a new transcription, include the speech_understanding parameter directly in your transcription request:

$ curl -X POST \
>   "https://api.assemblyai.com/v2/transcript" \
>   -H "Authorization: YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "audio_url": "https://assembly.ai/phone-msg.m4a",
>     "speaker_labels": true,
>     "speech_understanding": {
>       "request": {
>         "custom_formatting": {
>           "date": "mm/dd/yyyy",
>           "phone_number": "(xxx)xxx-xxxx",
>           "email": "username@domain.com",
>           "format_utterances": true
>         }
>       }
>     }
>   }'

Method 2: Add formatting to existing transcripts

For existing transcripts, retrieve the completed transcript and send it to the Speech Understanding API:

$ # Step 1: Get the completed transcript
$ transcript=$(curl -s -X GET \
>   "https://api.assemblyai.com/v2/transcript/YOUR_TRANSCRIPT_ID" \
>   -H "Authorization: YOUR_API_KEY")
$ 
$ # Step 2: Add custom formatting and send to Speech Understanding API
$ curl -X POST \
>   "https://llm-gateway.assemblyai.com/v1/understanding" \
>   -H "Authorization: YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "transcript_id": "{transcript_id}",
>     "speech_understanding": {
>       "request": {
>         "custom_formatting": {
>           "date": "mm/dd/yyyy",
>           "phone_number": "(xxx)xxx-xxxx",
>           "email": "username@domain.com",
>           "format_utterances": true
>         }
>       }
>     }
>   }'

Key	Type	Required?	Description
`speech_understanding`	object	Yes	Container for speech understanding requests.
`speech_understanding.request`	object	Yes	The understanding request configuration.
`speech_understanding.request.custom_formatting`	object	Yes	Custom formatting configuration.
`custom_formatting.date`	string	No	Date format pattern. Common patterns: `mm/dd/yyyy` (US), `dd/mm/yyyy` (European), `yyyy-mm-dd` (ISO).
`custom_formatting.phone_number`	string	No	Phone number format pattern. Examples: `(xxx)xxx-xxxx`, `xxx-xxx-xxxx`, `xxx.xxx.xxxx`.
`custom_formatting.email`	string	No	Email format pattern. Example: `username@domain.com`.
`custom_formatting.format_utterances`	boolean	No	When `true`, applies formatting to speaker utterances in addition to the main text. Preserves word-level timestamps. Default: `false`.

Response

The Custom Formatting API returns your original transcript response with formatting applied to the text field and additional formatting details in the speech_understanding object. When format_utterances is enabled, formatted utterances with preserved timestamps are also included.

1 {
2   "id": "2accd7f2-445b-4d08-b10b-1bafdd5906ed",
3   "status": "completed",
4   "text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...",
5   "speech_understanding": {
6     "request": {
7       "custom_formatting": {
8         "date": "mm/dd/yyyy",
9         "phone_number": "(xxx)xxx-xxxx",
10         "email": "username@domain.com",
11         "format_utterances": true
12       }
13     },
14     "response": {
15       "custom_formatting": {
16         "formatted_text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...",
17         "formatted_utterances": [
18           {
19             "confidence": 0.9920061471354167,
20             "end": 26000,
21             "speaker": "A",
22             "start": 1920,
23             "text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...",
24             "words": [
25               {
26                 "speaker": "A",
27                 "start": 1920,
28                 "end": 2160,
29                 "text": "Yes,",
30                 "confidence": 0.808349609375
31               }
32               // ... more words
33             ]
34           }
35         ],
36         "mapping": {
37           "555-679-3466": "(555)679-3466",
38           "555-679-8244": "(555)679-8244"
39         },
40         "status": "success"
41       }
42     }
43   }
44 }

Key	Type	Description
`text`	string	The transcript text with custom formatting applied.
`speech_understanding`	object	Container for speech understanding request and response information.
`speech_understanding.request`	object	The original custom formatting request configuration that was submitted.
`speech_understanding.request.custom_formatting`	object	The formatting parameters that were used.
`speech_understanding.response`	object	The response information from the formatting process.
`speech_understanding.response.custom_formatting`	object	Details about the formatting operation.
`speech_understanding.response.custom_formatting.formatted_text`	string	The complete transcript with custom formatting applied. Identical to the `text` field.
`speech_understanding.response.custom_formatting.formatted_utterances`	array	Array of speaker utterances with formatting applied. Only present when `format_utterances` is `true`. Each utterance includes speaker label, timestamps, confidence scores, formatted text, and word-level details with preserved timestamps.
`speech_understanding.response.custom_formatting.mapping`	object	An object showing the original text segments and their formatted versions. Keys are original text, values are formatted text.
`speech_understanding.response.custom_formatting.status`	string	The status of the formatting operation. Will be `"success"` when formatting completes successfully.

Understanding formatted_utterances

When format_utterances is enabled, each object in the formatted_utterances array contains:

Field	Type	Description
`speaker`	string	Speaker identifier (e.g., “A”, “B”)
`start`	integer	Start time of the utterance in milliseconds
`end`	integer	End time of the utterance in milliseconds
`text`	string	The utterance text with custom formatting applied
`confidence`	number	Confidence score for the utterance (0-1)
`words`	array	Array of word objects with individual timestamps, text (formatted), confidence scores, and speaker labels

Important: All timestamps in both utterances and words are preserved exactly as they appear in the original transcription, ensuring perfect temporal alignment with the audio even after formatting is applied.

Key differences from standard transcription

Field	Standard Transcription	With Custom Formatting	With format_utterances=true
`text`	Transcribed text with default formatting	Transcribed text with your custom formatting rules applied	Same as custom formatting
`speech_understanding`	Not present	Object containing formatting request, response, and mapping	Same, plus `formatted_text` and `formatted_utterances`
`utterances`	Speaker-separated segments with original text	Unchanged	Unchanged (original utterances remain)
Word timestamps	Original timestamps	Preserved exactly	Preserved exactly in `formatted_utterances`

All other fields from the original transcript (words, utterances, confidence, etc.) remain unchanged. The formatted_utterances field provides an additional view of the data with formatting applied while maintaining complete timestamp fidelity.

Supported languages

Supported models

Supported regions

Overview

Quickstart

Method 1: Transcribe and format in one request

Python

JavaScript

Method 2: Transcribe and format in separate requests

Python

JavaScript

Output format

Understanding the custom_formatting parameter

Available formatting options

Common formatting patterns

Date formats

Phone number formats

Email formats

Best practices

API reference

Request

Method 1: Transcribe and format in one request

Method 2: Add formatting to existing transcripts

Response

Understanding formatted_utterances

Key differences from standard transcription

Understanding the `custom_formatting` parameter