{"ID":2849572,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.23203","arxiv_id":"2510.23203","title":"DecoDINO: 3D Human-Scene Contact Prediction with Semantic Classification","abstract":"Accurate vertex-level contact prediction between humans and surrounding objects is a prerequisite for high fidelity human object interaction models used in robotics, AR/VR, and behavioral simulation. DECO was the first in the wild estimator for this task but is limited to binary contact maps and struggles with soft surfaces, occlusions, children, and false-positive foot contacts. We address these issues and introduce DecoDINO, a three-branch network based on DECO's framework. It uses two DINOv2 ViT-g/14 encoders, class-balanced loss weighting to reduce bias, and patch-level cross-attention for improved local reasoning. Vertex features are finally passed through a lightweight MLP with a softmax to assign semantic contact labels. We also tested a vision-language model (VLM) to integrate text features, but the simpler architecture performed better and was used instead. On the DAMON benchmark, DecoDINO (i) raises the binary-contact F1 score by 7$\\%$, (ii) halves the geodesic error, and (iii) augments predictions with object-level semantic labels. Ablation studies show that LoRA fine-tuning and the dual encoders are key to these improvements. DecoDINO outperformed the challenge baseline in both tasks of the DAMON Challenge. Our code is available at https://github.com/DavidePasero/deco/tree/main.","short_abstract":"Accurate vertex-level contact prediction between humans and surrounding objects is a prerequisite for high fidelity human object interaction models used in robotics, AR/VR, and behavioral simulation. DECO was the first in the wild estimator for this task but is limited to binary contact maps and struggles with soft sur...","url_abs":"https://arxiv.org/abs/2510.23203","url_pdf":"https://arxiv.org/pdf/2510.23203v1","authors":"[\"Lukas Bierling\",\"Davide Pasero\",\"Fleur Dolmans\",\"Helia Ghasemi\",\"Angelo Broere\"]","published":"2025-10-27T10:46:22Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\",\"LoRA\"]","has_code":false,"code_links":[{"ID":607721,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2849572,"paper_url":"https://arxiv.org/abs/2510.23203","paper_title":"DecoDINO: 3D Human-Scene Contact Prediction with Semantic Classification","repo_url":"https://github.com/DavidePasero/deco","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}