A Multimodal Data Fusion Attention-Empowered Generative Adversarial Network for Real Time 3D Underwater Sound Speed Field Construction
Abstract
Sound speed profiles (SSPs) are crucial underwater parameters that determine the propagation patterns of acoustic signals, directly influencing the energy efficiency of underwater communication and the accuracy of positioning systems. Conventional techniques for obtaining SSPs, such as matched field processing (MFP), compressive sensing (CS), and deep learning (DL), typically depend on on-site sonar measurements, which impose stringent requirements on the deployment of underwater observation systems. To overcome this limitation and enable high-precision sound speed field reconstruction without the need for on-site underwater data collection, we propose a novel multimodal data-fusion generative adversarial network enhanced with residual attention blocks (MDF-RAGAN). This architecture integrates attention mechanisms to capture global spatial feature correlations effectively, while residual modules are employed to extract subtle perturbations in deep-ocean sound velocity distribution caused by sea surface temperature (SST) variations. Experimental results on a public real-world dataset demonstrate that the proposed model outperforms other state-of-the-art methods, achieving an estimation error of less than 0.3 m/s. Specifically, MDF-RAGAN reduces the root mean square error (RMSE) by nearly half compared to convolutional neural network (CNN) and spatial interpolation (SITP) methods, and attains a 65.8\% RMSE reduction relative to the mean profile method. These results highlight the effectiveness of multi-source fusion and cross-modal attention in enhancing the accuracy and robustness of sound speed profile reconstruction.