Disentangling Static and Dynamic Information for Reducing Static Bias in Action Recognition

cs.CV arXiv:2509.23009
View PDF arXiv JSON

Abstract

Action recognition models rely excessively on static cues rather than dynamic human motion, which is known as static bias. This bias leads to poor performance in real-world applications and zero-shot action recognition. In this paper, we propose a method to reduce static bias by separating temporal dynamic information from static scene information. Our approach uses a statistical independence loss between biased and unbiased streams, combined with a scene prediction loss. Our experiments demonstrate that this method effectively reduces static bias and confirm the importance of scene prediction loss.

PDF Viewer