{"ID":2890175,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.19947","arxiv_id":"2507.19947","title":"Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations","abstract":"Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human-robot collaborative task performance.","short_abstract":"Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that gro...","url_abs":"https://arxiv.org/abs/2507.19947","url_pdf":"https://arxiv.org/pdf/2507.19947v2","authors":"[\"Supawich Sitdhipol\",\"Waritwong Sukprasongdee\",\"Ekapol Chuangsuwanich\",\"Rina Tse\"]","published":"2025-07-26T13:24:02Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.CL\",\"cs.IT\",\"cs.LG\",\"eess.SY\"]","methods":"[]","has_code":false}
