{"ID":2884479,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.07112","arxiv_id":"2508.07112","title":"AugLift: Depth-Aware Input Reparameterization Improves Domain Generalization in 2D-to-3D Pose Lifting","abstract":"Lifting-based 3D human pose estimation infers 3D joints from 2D keypoints but generalizes poorly because $(x,y)$ coordinates alone are an ill-posed, sparse representation that discards geometric information modern foundation models can recover. We propose \\emph{AugLift}, which changes the representation format of lifting from 2D coordinates to a 6D geometric descriptor via two modules: (1) an \\emph{Uncertainty-Aware Depth Descriptor} (UADD) -- a compact tuple $(c, d, d_{\\min}, d_{\\max})$ extracted from a confidence-scaled neighborhood of an off-the-shelf monocular depth map -- and (2) a scale normalization component that handles train/test distance shifts. AugLift requires no new sensors, no new data collection, and no architectural changes beyond widening the input layer; because it operates at the representation level, it is composable with any lifting architecture or domain generalization technique. In the detection setting, AugLift reduces cross-dataset MPJPE by $10.1$% on average across four datasets and four lifting architectures while improving in-distribution accuracy by $4.0$%; post-hoc analysis shows gains concentrate on novel poses and occluded joints. In the ground-truth 2D setting, combining AugLift with PoseAug's differentiable domain generalization achieves state-of-the-art cross-dataset performance ($62.4$\\,mm on 3DHP, $92.6$\\,mm on 3DPW; $14.5$% and $22.2$% over PoseAug), demonstrating that foundation-model depth provides genuine geometric signal complementary to explicit 3D augmentation. Code will be made publicly available.","short_abstract":"Lifting-based 3D human pose estimation infers 3D joints from 2D keypoints but generalizes poorly because $(x,y)$ coordinates alone are an ill-posed, sparse representation that discards geometric information modern foundation models can recover. We propose \\emph{AugLift}, which changes the representation format of lifti...","url_abs":"https://arxiv.org/abs/2508.07112","url_pdf":"https://arxiv.org/pdf/2508.07112v4","authors":"[\"Nikolai Warner\",\"Wenjin Zhang\",\"Hamid Badiozamani\",\"Irfan Essa\",\"Apaar Sadhwani\"]","published":"2025-08-09T22:36:31Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.LG\"]","methods":"[]","has_code":false}
