{"ID":2826213,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.19270","arxiv_id":"2512.19270","title":"Are All Data Necessary? Efficient Data Pruning for Large-scale Autonomous Driving Dataset via Trajectory Entropy Maximization","abstract":"Collecting large-scale naturalistic driving data is essential for training robust autonomous driving planners. However, real-world datasets often contain a substantial amount of repetitive and low-value samples, which lead to excessive storage costs and bring limited benefits to policy learning. To address this issue, we propose an information-theoretic data pruning method that effectively reduces the training data volume without compromising model performance. Our approach evaluates the trajectory distribution information entropy of driving data and iteratively selects high-value samples that preserve the statistical characteristics of the original dataset in a model-agnostic manner. From a theoretical perspective, we show that maximizing trajectory entropy effectively constrains the Kullback-Leibler divergence between the pruned subset and the original data distribution, thereby maintaining generalization ability. Comprehensive experiments on the NuPlan benchmark with a large-scale imitation learning framework demonstrate that the proposed method can reduce the dataset size by up to 40% while maintaining closed-loop performance. This work provides a lightweight and theoretically grounded approach for scalable data management and efficient policy learning in autonomous driving systems.","short_abstract":"Collecting large-scale naturalistic driving data is essential for training robust autonomous driving planners. However, real-world datasets often contain a substantial amount of repetitive and low-value samples, which lead to excessive storage costs and bring limited benefits to policy learning. To address this issue,...","url_abs":"https://arxiv.org/abs/2512.19270","url_pdf":"https://arxiv.org/pdf/2512.19270v1","authors":"[\"Zhaoyang Liu\",\"Weitao Zhou\",\"Junze Wen\",\"Cheng Jing\",\"Qian Cheng\",\"Kun Jiang\",\"Diange Yang\"]","published":"2025-12-22T11:07:18Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[]","has_code":false}