{"ID":2881540,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.12520","arxiv_id":"2508.12520","title":"An Initial Study of Bird's-Eye View Generation for Autonomous Vehicles using Cross-View Transformers","abstract":"Bird's-Eye View (BEV) maps provide a structured, top-down abstraction that is crucial for autonomous-driving perception. In this work, we employ Cross-View Transformers (CVT) for learning to map camera images to three BEV's channels - road, lane markings, and planned trajectory - using a realistic simulator for urban driving. Our study examines generalization to unseen towns, the effect of different camera layouts, and two loss formulations (focal and L1). Using training data from only a town, a four-camera CVT trained with the L1 loss delivers the most robust test performance, evaluated in a new town. Overall, our results underscore CVT's promise for mapping camera inputs to reasonably accurate BEV maps.","short_abstract":"Bird's-Eye View (BEV) maps provide a structured, top-down abstraction that is crucial for autonomous-driving perception. In this work, we employ Cross-View Transformers (CVT) for learning to map camera images to three BEV's channels - road, lane markings, and planned trajectory - using a realistic simulator for urban d...","url_abs":"https://arxiv.org/abs/2508.12520","url_pdf":"https://arxiv.org/pdf/2508.12520v1","authors":"[\"Felipe Carlos dos Santos\",\"Eric Aislan Antonelo\",\"Gustavo Claudio Karl Couto\"]","published":"2025-08-17T23:05:00Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false}