{"ID":2828263,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.19743","arxiv_id":"2512.19743","title":"From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning","abstract":"Loss functions are fundamental to learning accurate 3D point cloud models, yet common choices trade geometric fidelity for computational cost. Chamfer Distance is efficient but permits many-to-one correspondences, while Earth Mover Distance better reflects one-to-one transport at high computational cost. APML approximates transport with differentiable Sinkhorn iterations and an analytically derived temperature, but its dense formulation scales quadratically in memory. We present CUDA-APML, a sparse GPU implementation that thresholds negligible assignments and runs adaptive softmax, bidirectional symmetrization, and Sinkhorn normalization directly in COO form. This yields near-linear memory scaling and preserves gradients on the stored support, while pairwise distance evaluation remains quadratic in the current implementation. On ShapeNet and MM-Fi, CUDA-APML matches dense APML within a small tolerance while reducing peak GPU memory by 99.9%. Code available at: https://github.com/Multimodal-Sensing-Lab/apml","short_abstract":"Loss functions are fundamental to learning accurate 3D point cloud models, yet common choices trade geometric fidelity for computational cost. Chamfer Distance is efficient but permits many-to-one correspondences, while Earth Mover Distance better reflects one-to-one transport at high computational cost. APML approxima...","url_abs":"https://arxiv.org/abs/2512.19743","url_pdf":"https://arxiv.org/pdf/2512.19743v1","authors":"[\"Sasan Sharifipour\",\"Constantino Álvarez Casado\",\"Manuel Lage Cañellas\",\"Miguel Bordallo López\"]","published":"2025-12-17T23:18:51Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CG\"]","methods":"[]","has_code":false,"code_links":[{"ID":605860,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2828263,"paper_url":"https://arxiv.org/abs/2512.19743","paper_title":"From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning","repo_url":"https://github.com/Multimodal-Sensing-Lab/apml","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
