DOI: 10.3390/app14083385 ISSN: 2076-3417

DIFFBAS: An Advanced Binaural Audio Synthesis Model Focusing on Binaural Differences Recovery

Yusen Li, Ying Shen, Dongqing Wang
  • Fluid Flow and Transfer Processes
  • Computer Science Applications
  • Process Chemistry and Technology
  • General Engineering
  • Instrumentation
  • General Materials Science

Binaural audio synthesis (BAS) aims to restore binaural audio from mono signals obtained from the environment to enhance users’ immersive experiences. It plays an essential role in building Augmented Reality and Virtual Reality environments. Existing deep neural network (DNN)-based BAS systems synthesize binaural audio by modeling the overall sound propagation processes from the source to the left and right ears, which encompass early decay, room reverberation, and head/ear-related filtering. However, this end-to-end modeling approach brings in the overfitting problem for BAS models when they are trained using a small and homogeneous data set. Moreover, existing losses cannot well supervise the training process. As a consequence, the accuracy of synthesized binaural audio is far from satisfactory on binaural differences. In this work, we propose a novel DNN-based BAS method, namely DIFFBAS, to improve the accuracy of synthesized binaural audio from the perspective of the interaural phase difference. Specifically, DIFFBAS is trained using the average signals of the left and right channels. To make the model learn the binaural differences, we propose a new loss named Interaural Phase Difference (IPD) loss to supervise the model training. Extensive experiments have been performed and the results demonstrate the effectiveness of the DIFFBAS model and the IPD loss.

More from our Archive