DASR: Denoising for Automatic Speech Recognition in Noisy Environments
Abstract
The paper examines the challenge of speech recognition in noisy environments. We introduce an innovative approach called DASR (Denoising for Automatic Speech Recognition), which adapts modern neural network algorithms for speech recognition by incorporating additional denoising neural network models. These models act as denoisers, modifying the audio signal input to the speech recognition systems. Notably, our approach does not necessitate altering the weights of the recognition models it is applied to, eliminating the need for finetuning to adapt the models to new domains. This significantly reduces the computational resources required for transcription. We explore the training process of these denoising models and their impact on recognition quality. Our experiments, conducted on open datasets such as CommonVoice, demonstrate that the proposed approach achieves recognition quality comparable to traditional finetuning methods. Additionally, it exhibits several interesting zero-shot properties. The code is available in https://github.com/Petilia/dasr
Similar publications
partnership