{"ID":2863997,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25528","arxiv_id":"2509.25528","title":"LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models","abstract":"Referential grounding in outdoor driving scenes is challenging due to large scene variability, many visually similar objects, and dynamic elements that complicate resolving natural-language references (e.g., \"the black car on the right\"). We propose LLM-RG, a hybrid pipeline that combines off-the-shelf vision-language models for fine-grained attribute extraction with large language models for symbolic reasoning. LLM-RG processes an image and a free-form referring expression by using an LLM to extract relevant object types and attributes, detecting candidate regions, generating rich visual descriptors with a VLM, and then combining these descriptors with spatial metadata into natural-language prompts that are input to an LLM for chain-of-thought reasoning to identify the referent's bounding box. Evaluated on the Talk2Car benchmark, LLM-RG yields substantial gains over both LLM and VLM-based baselines. Additionally, our ablations show that adding 3D spatial cues further improves grounding. Our results demonstrate the complementary strengths of VLMs and LLMs, applied in a zero-shot manner, for robust outdoor referential grounding.","short_abstract":"Referential grounding in outdoor driving scenes is challenging due to large scene variability, many visually similar objects, and dynamic elements that complicate resolving natural-language references (e.g., \"the black car on the right\"). We propose LLM-RG, a hybrid pipeline that combines off-the-shelf vision-language...","url_abs":"https://arxiv.org/abs/2509.25528","url_pdf":"https://arxiv.org/pdf/2509.25528v2","authors":"[\"Pranav Saxena\",\"Avigyan Bhattacharya\",\"Ji Zhang\",\"Wenshan Wang\"]","published":"2025-09-29T21:32:54Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.RO\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
