{"ID":2845329,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.04351","arxiv_id":"2511.04351","title":"RCMCL: A Unified Contrastive Learning Framework for Robust Multi-Modal (RGB-D, Skeleton, Point Cloud) Action Understanding","abstract":"Human action recognition (HAR) with multi-modal inputs (RGB-D, skeleton, point cloud) can achieve high accuracy but typically relies on large labeled datasets and degrades sharply when sensors fail or are noisy. We present Robust Cross-Modal Contrastive Learning (RCMCL), a self-supervised framework that learns modality-invariant representations and remains reliable under modality dropout and corruption. RCMCL jointly optimizes (i) a cross-modal contrastive objective that aligns heterogeneous streams, (ii) an intra-modal self-distillation objective that improves view-invariance and reduces redundancy, and (iii) a degradation simulation objective that explicitly trains models to recover from masked or corrupted inputs. At inference, an Adaptive Modality Gating (AMG) network assigns data-driven reliability weights to each modality for robust fusion. On NTU RGB+D 120 (CS/CV) and UWA3D-II, RCMCL attains state-of-the-art accuracy in standard settings and exhibits markedly better robustness: under severe dual-modality dropout it shows only an 11.5% degradation, significantly outperforming strong supervised fusion baselines. These results indicate that self-supervised cross-modal alignment, coupled with explicit degradation modeling and adaptive fusion, is key to deployable multi-modal HAR.","short_abstract":"Human action recognition (HAR) with multi-modal inputs (RGB-D, skeleton, point cloud) can achieve high accuracy but typically relies on large labeled datasets and degrades sharply when sensors fail or are noisy. We present Robust Cross-Modal Contrastive Learning (RCMCL), a self-supervised framework that learns modality...","url_abs":"https://arxiv.org/abs/2511.04351","url_pdf":"https://arxiv.org/pdf/2511.04351v3","authors":"[\"Hasan Akgul\",\"Mari Eplik\",\"Javier Rojas\",\"Akira Yamamoto\",\"Rajesh Kumar\",\"Maya Singh\"]","published":"2025-11-06T13:32:50Z","proceeding":"eess.SP","tasks":"[\"eess.SP\"]","methods":"[]","has_code":false}
