A Survey on End-To-End Speech Recognition Architectures in 2021
As we continue our core research and development efforts to push the boundaries of state of the art accuracy, we begin exploring architectures for end-to-end speech recognition that are gaining relative new popularity, not only in the research domain but also in production settings in industry. We are specially interested in architectures that have shown production level accuracy matching or surpassing that of conventional hybrid DNN-HMM speech recognition systems. With that in mind, we survey the following two architectures: Listen Attend and Spell, and Recurrent Neural Network Transducers.
Read more
