{"ID":2899761,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.00984","arxiv_id":"2507.00984","title":"Box Pose and Shape Estimation and Domain Adaptation for Large-Scale Warehouse Automation","abstract":"Modern warehouse automation systems rely on fleets of intelligent robots that generate vast amounts of data -- most of which remains unannotated. This paper develops a self-supervised domain adaptation pipeline that leverages real-world, unlabeled data to improve perception models without requiring manual annotations. Our work focuses specifically on estimating the pose and shape of boxes and presents a correct-and-certify pipeline for self-supervised box pose and shape estimation. We extensively evaluate our approach across a range of simulated and real industrial settings, including adaptation to a large-scale real-world dataset of 50,000 images. The self-supervised model significantly outperforms models trained solely in simulation and shows substantial improvements over a zero-shot 3D bounding box estimation baseline.","short_abstract":"Modern warehouse automation systems rely on fleets of intelligent robots that generate vast amounts of data -- most of which remains unannotated. This paper develops a self-supervised domain adaptation pipeline that leverages real-world, unlabeled data to improve perception models without requiring manual annotations....","url_abs":"https://arxiv.org/abs/2507.00984","url_pdf":"https://arxiv.org/pdf/2507.00984v1","authors":"[\"Xihang Yu\",\"Rajat Talak\",\"Jingnan Shi\",\"Ulrich Viereck\",\"Igor Gilitschenski\",\"Luca Carlone\"]","published":"2025-07-01T17:36:09Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.CV\",\"cs.LG\"]","methods":"[]","has_code":false}