Fusing Multi- and Hyperspectral Satellite Data for Harmful Algal Bloom Monitoring with Self-Supervised and Hierarchical Deep Learning
Abstract
We present a self-supervised machine learning framework for detecting and mapping the severity and speciation of harmful algal blooms (HABs) using multi-sensor satellite data. By fusing reflectance data from operational polar-orbiting satellite-based instruments (VIIRS, MODIS, OLCI, and OCI) with TROPOMI solar-induced fluorescence (SIF), our framework, called SIT-FUSE, generates HAB severity and speciation products without requiring per-instrument labeled datasets. The framework employs self-supervised representation learning and hierarchical deep clustering to segment phytoplankton cell abundance and species into interpretable classes, validated against in-situ data from the Gulf of Mexico and Southern California (2018-2025). Results show strong agreement with total phytoplankton, Karena brevis, and Pseudo-nitzschia spp. measurements. This work advances scalable HAB monitoring in environments where ground truth observations are limited, while enabling exploratory analysis via hierarchical embeddings - a critical step toward operationalizing self-supervised learning for global aquatic biogeochemistry.