AI Research Review - Spelling and ASR

This week’s AI Research Review is Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems.

What’s Exciting About this Paper

This paper proposed a general ASR biasing solution that is domain-insensitive and can be adopted in different scenarios.

The Auto Regressive model achieved 51% relative Word Error Rate (WER) reduction on rare words or proper nouns.

The authors proposed a novel approach to handle large context phrase lists both during training and inference to improve efficiency and scalability.

Key Findings

This is a seq2seq model that corrects spelling of rare words or proper nouns by paying attention to both ASR hypotheses and external context words/phrases. The authors proposed a novel approach to handle large context phrase lists both during training and inference to improve efficiency and scalability.

The best WER reduction was achieved by combining Shallow Fusion and Contextual Spelling Correction. The model works really well on the high OOV rate test set, which indicates that the model learns error patterns in subword level rather than word level.

Our Takeaways

ASR biasing as a post-processing is a reasonable choice to improve proper noun detection in end-to-end ASR compared to biasing the ASR encoder (e.g., Contextual RNN-T, CLAS).

Although both of the AR and NAR solutions work really well to reduce proper noun WER, the NAR model speeds up inference by 2.1 times.

AI Research Review - Spelling and ASR

What’s Exciting About this Paper

Key Findings

Our Takeaways

Popular posts

AI trends in 2024: Graph Neural Networks

AI for Universal Audio Understanding: Qwen-Audio Explained

Combining Speech Recognition and Diarization in one model

How DALL-E 2 Actually Works