{"ID":2891117,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.17089","arxiv_id":"2507.17089","title":"IONext: Unlocking the Next Era of Inertial Odometry","abstract":"Researchers have increasingly adopted Transformer-based models for inertial odometry. While Transformers excel at modeling long-range dependencies, their limited sensitivity to local, fine-grained motion variations and lack of inherent inductive biases often hinder localization accuracy and generalization. Recent studies have shown that incorporating large-kernel convolutions and Transformer-inspired architectural designs into CNN can effectively expand the receptive field, thereby improving global motion perception. Motivated by these insights, we propose a novel CNN-based module called the Dual-wing Adaptive Dynamic Mixer (DADM), which adaptively captures both global motion patterns and local, fine-grained motion features from dynamic inputs. This module dynamically generates selective weights based on the input, enabling efficient multi-scale feature aggregation. To further improve temporal modeling, we introduce the Spatio-Temporal Gating Unit (STGU), which selectively extracts representative and task-relevant motion features in the temporal domain. This unit addresses the limitations of temporal modeling observed in existing CNN approaches. Built upon DADM and STGU, we present a new CNN-based inertial odometry backbone, named Next Era of Inertial Odometry (IONext). Extensive experiments on six public datasets demonstrate that IONext consistently outperforms state-of-the-art (SOTA) Transformer- and CNN-based methods. For instance, on the RNIN dataset, IONext reduces the average ATE by 10% and the average RTE by 12% compared to the representative model iMOT.","short_abstract":"Researchers have increasingly adopted Transformer-based models for inertial odometry. While Transformers excel at modeling long-range dependencies, their limited sensitivity to local, fine-grained motion variations and lack of inherent inductive biases often hinder localization accuracy and generalization. Recent studi...","url_abs":"https://arxiv.org/abs/2507.17089","url_pdf":"https://arxiv.org/pdf/2507.17089v2","authors":"[\"Shanshan Zhang\",\"Qi Zhang\",\"Siyue Wang\",\"Tianshui Wen\",\"Liqin Wu\",\"Ziheng Zhou\",\"Xuemin Hong\",\"Ao Peng\",\"Lingxiang Zheng\",\"Yu Yang\"]","published":"2025-07-23T00:09:36Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.RO\"]","methods":"[\"Transformer\",\"Convolutional Neural Network\"]","has_code":false}
