This week’s AI Research Review is Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems.
What’s Exciting About this Paper
This paper proposed a general ASR biasing solution that is domain-insensitive and can be adopted in different scenarios.
The Auto Regressive model achieved 51% relative Word Error Rate (WER) reduction on rare words or proper nouns.
The authors proposed a novel approach to handle large context phrase lists both during training and inference to improve efficiency and scalability.
This is a seq2seq model that corrects spelling of rare words or proper nouns by paying attention to both ASR hypotheses and external context words/phrases. The authors proposed a novel approach to handle large context phrase lists both during training and inference to improve efficiency and scalability.
The best WER reduction was achieved by combining Shallow Fusion and Contextual Spelling Correction. The model works really well on the high OOV rate test set, which indicates that the model learns error patterns in subword level rather than word level.
ASR biasing as a post-processing is a reasonable choice to improve proper noun detection in end-to-end ASR compared to biasing the ASR encoder (e.g., Contextual RNN-T, CLAS).
Although both of the AR and NAR solutions work really well to reduce proper noun WER, the NAR model speeds up inference by 2.1 times.