{"ID":2843568,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.08549","arxiv_id":"2511.08549","title":"Vision Transformer Based User Equipment Positioning","abstract":"Recently, Deep Learning (DL) techniques have been used for User Equipment (UE) positioning. However, the key shortcomings of such models is that: i) they weigh the same attention to the entire input; ii) they are not well suited for the non-sequential data e.g., when only instantaneous Channel State Information (CSI) is available. In this context, we propose an attention-based Vision Transformer (ViT) architecture that focuses on the Angle Delay Profile (ADP) from CSI matrix. Our approach, validated on the `DeepMIMO' and `ViWi' ray-tracing datasets, achieves an Root Mean Squared Error (RMSE) of 0.55m indoors, 13.59m outdoors in DeepMIMO, and 3.45m in ViWi's outdoor blockage scenario. The proposed scheme outperforms state-of-the-art schemes by $\\sim$ 38\\%. It also performs substantially better than other approaches that we have considered in terms of the distribution of error distance.","short_abstract":"Recently, Deep Learning (DL) techniques have been used for User Equipment (UE) positioning. However, the key shortcomings of such models is that: i) they weigh the same attention to the entire input; ii) they are not well suited for the non-sequential data e.g., when only instantaneous Channel State Information (CSI) i...","url_abs":"https://arxiv.org/abs/2511.08549","url_pdf":"https://arxiv.org/pdf/2511.08549v1","authors":"[\"Parshwa Shah\",\"Dhaval K. Patel\",\"Brijesh Soni\",\"Miguel López-Benítez\",\"Siddhartan Govindasamy\"]","published":"2025-11-11T18:31:29Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.NI\"]","methods":"[\"Vision Transformer\",\"Transformer\"]","has_code":false}