{"ID":2854163,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.15725","arxiv_id":"2510.15725","title":"DGME-T: Directional Grid Motion Encoding for Transformer-Based Historical Camera Movement Classification","abstract":"Camera movement classification (CMC) models trained on contemporary, high-quality footage often degrade when applied to archival film, where noise, missing frames, and low contrast obscure motion cues. We bridge this gap by assembling a unified benchmark that consolidates two modern corpora into four canonical classes and restructures the HISTORIAN collection into five balanced categories. Building on this benchmark, we introduce DGME-T, a lightweight extension to the Video Swin Transformer that injects directional grid motion encoding, derived from optical flow, via a learnable and normalised late-fusion layer. DGME-T raises the backbone's top-1 accuracy from 81.78% to 86.14% and its macro F1 from 82.08% to 87.81% on modern clips, while still improving the demanding World-War-II footage from 83.43% to 84.62% accuracy and from 81.72% to 82.63% macro F1. A cross-domain study further shows that an intermediate fine-tuning stage on modern data increases historical performance by more than five percentage points. These results demonstrate that structured motion priors and transformer representations are complementary and that even a small, carefully calibrated motion head can substantially enhance robustness in degraded film analysis. Related resources are available at https://github.com/linty5/DGME-T.","short_abstract":"Camera movement classification (CMC) models trained on contemporary, high-quality footage often degrade when applied to archival film, where noise, missing frames, and low contrast obscure motion cues. We bridge this gap by assembling a unified benchmark that consolidates two modern corpora into four canonical classes...","url_abs":"https://arxiv.org/abs/2510.15725","url_pdf":"https://arxiv.org/pdf/2510.15725v1","authors":"[\"Tingyu Lin\",\"Armin Dadras\",\"Florian Kleber\",\"Robert Sablatnig\"]","published":"2025-10-17T15:14:11Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"eess.IV\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":608121,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2854163,"paper_url":"https://arxiv.org/abs/2510.15725","paper_title":"DGME-T: Directional Grid Motion Encoding for Transformer-Based Historical Camera Movement Classification","repo_url":"https://github.com/linty5/DGME-T","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
