{"ID":2843501,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.08409","arxiv_id":"2511.08409","title":"Faithful-First Reasoning, Planning, and Acting for Multimodal LLMs","abstract":"Multimodal Large Language Models (MLLMs) frequently suffer from unfaithfulness, generating reasoning chains that drift from visual evidence or contradict final predictions. We propose Faithful-First Reasoning, Planning, and Acting (RPA) framework in which FaithEvi provides step-wise and chain-level supervision by evaluating the faithfulness of intermediate reasoning, and FaithAct uses these signals to plan and execute faithfulness-aware actions during inference. Experiments across multiple multimodal reasoning benchmarks show that faithful-first RPA improves perceptual faithfulness by up to 24% over prompt-based and tool-augmented reasoning frameworks, without degrading task accuracy. Our analysis shows that treating faithfulness as a guiding principle perceptually faithful reasoning trajectories and mitigates hallucination behavior. This work thereby establishes a unified framework for both evaluating and enforcing faithfulness in multimodal reasoning. Code is at https://github.com/lijunxian111/Faithful-First-RPA.","short_abstract":"Multimodal Large Language Models (MLLMs) frequently suffer from unfaithfulness, generating reasoning chains that drift from visual evidence or contradict final predictions. We propose Faithful-First Reasoning, Planning, and Acting (RPA) framework in which FaithEvi provides step-wise and chain-level supervision by evalu...","url_abs":"https://arxiv.org/abs/2511.08409","url_pdf":"https://arxiv.org/pdf/2511.08409v4","authors":"[\"Junxian Li\",\"Xinyue Xu\",\"Sai Ma\",\"Di Zhang\",\"Sichao Li\"]","published":"2025-11-11T16:22:49Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":607221,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2843501,"paper_url":"https://arxiv.org/abs/2511.08409","paper_title":"Faithful-First Reasoning, Planning, and Acting for Multimodal LLMs","repo_url":"https://github.com/lijunxian111/Faithful-First-RPA","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}