Find the Leak, Fix the Split: Cluster-Based Method to Prevent Leakage in Video-Derived Datasets

cs.CV arXiv:2511.13944
View PDF arXiv JSON

Abstract

We propose a cluster-based frame selection strategy to mitigate information leakage in video-derived frames datasets. By grouping visually similar frames before splitting into training, validation, and test sets, the method produces more representative, balanced, and reliable dataset partitions.

PDF Viewer