{"ID":2866482,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19936","arxiv_id":"2509.19936","title":"CapStARE: Capsule-based Spatiotemporal Architecture for Robust and Efficient Gaze Estimation","abstract":"We introduce CapStARE, a capsule-based spatio-temporal architecture for gaze estimation that integrates a ConvNeXt backbone, capsule formation with attention routing, and dual GRU decoders specialized for slow and rapid gaze dynamics. This modular design enables efficient part-whole reasoning and disentangled temporal modeling, achieving state-of-the-art performance on ETH-XGaze (3.36) and MPIIFaceGaze (2.65) while maintaining real-time inference (\u003c 10 ms). The model also generalizes well to unconstrained conditions in Gaze360 (9.06) and human-robot interaction scenarios in RT-GENE (4.76), outperforming or matching existing methods with fewer parameters and greater interpretability. These results demonstrate that CapStARE offers a practical and robust solution for real-time gaze estimation in interactive systems. The related code and results for this article can be found on: https://github.com/toukapy/capsStare","short_abstract":"We introduce CapStARE, a capsule-based spatio-temporal architecture for gaze estimation that integrates a ConvNeXt backbone, capsule formation with attention routing, and dual GRU decoders specialized for slow and rapid gaze dynamics. This modular design enables efficient part-whole reasoning and disentangled temporal...","url_abs":"https://arxiv.org/abs/2509.19936","url_pdf":"https://arxiv.org/pdf/2509.19936v1","authors":"[\"Miren Samaniego\",\"Igor Rodriguez\",\"Elena Lazkano\"]","published":"2025-09-24T09:43:34Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":609375,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2866482,"paper_url":"https://arxiv.org/abs/2509.19936","paper_title":"CapStARE: Capsule-based Spatiotemporal Architecture for Robust and Efficient Gaze Estimation","repo_url":"https://github.com/toukapy/capsStare","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
