Physic-HM: Restoring Physical Generative Logic in Multimodal Anomaly Detection via Hierarchical Modulation

cs.LG arXiv:2512.21650
View PDF arXiv JSON

Abstract

Multimodal Unsupervised Anomaly Detection (UAD) is critical for quality assurance in smart manufacturing, particularly in complex processes like robotic welding. However, existing methods often suffer from process-logic blindness, treating process modalities (e.g., real-time video, audio, and sensors) and result modalities (e.g., post-weld images) as symmetric feature sources, thereby ignoring the inherent unidirectional physical generative logic. Furthermore, the heterogeneity gap between high-dimensional visual data and low-dimensional sensor signals frequently leads to critical process context being drowned out. In this paper, we propose Physic-HM, a multimodal UAD framework that explicitly incorporates physical inductive bias to model the process-to-result dependency. Specifically, our framework incorporates two key innovations: a Sensor-Guided PHM Modulation mechanism that utilizes low-dimensional sensor signals as context to guide high-dimensional audio-visual feature extraction, and a Physic-Hierarchical architecture that enforces a unidirectional generative mapping to identify anomalies that violate physical consistency. Extensive experiments on Weld-4M benchmark demonstrate that Physic-HM achieves a SOTA I-AUROC of 90.7%. The source code of Physic-HM will be released after the paper is accepted.

PDF Viewer