{"ID":2841894,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.11824","arxiv_id":"2511.11824","title":"SOTFormer: A Minimal Transformer for Unified Object Tracking and Trajectory Prediction","abstract":"Accurate single-object tracking and short-term motion forecasting remain challenging under occlusion, scale variation, and temporal drift, which disrupt the temporal coherence required for real-time perception. We introduce \\textbf{SOTFormer}, a minimal constant-memory temporal transformer that unifies object detection, tracking, and short-horizon trajectory prediction within a single end-to-end framework. Unlike prior models with recurrent or stacked temporal encoders, SOTFormer achieves stable identity propagation through a ground-truth-primed memory and a burn-in anchor loss that explicitly stabilizes initialization. A single lightweight temporal-attention layer refines embeddings across frames, enabling real-time inference with fixed GPU memory. On the Mini-LaSOT (20%) benchmark, SOTFormer attains 76.3 AUC and 53.7 FPS (AMP, 4.3 GB VRAM), outperforming transformer baselines such as TrackFormer and MOTRv2 under fast motion, scale change, and occlusion.","short_abstract":"Accurate single-object tracking and short-term motion forecasting remain challenging under occlusion, scale variation, and temporal drift, which disrupt the temporal coherence required for real-time perception. We introduce \\textbf{SOTFormer}, a minimal constant-memory temporal transformer that unifies object detection...","url_abs":"https://arxiv.org/abs/2511.11824","url_pdf":"https://arxiv.org/pdf/2511.11824v1","authors":"[\"Zhongping Dong\",\"Pengyang Yu\",\"Shuangjian Li\",\"Liming Chen\",\"Mohand Tahar Kechadi\"]","published":"2025-11-14T19:25:05Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\"]","has_code":false}