{"ID":2839240,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.16672","arxiv_id":"2511.16672","title":"EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards","abstract":"Recent advances in large multimodal models (LMMs) have enabled impressive reasoning and perception abilities, yet most existing training pipelines still depend on human-curated data or externally verified reward models, limiting their autonomy and scalability. In this work, we strive to improve LMM reasoning capabilities in a purely unsupervised fashion (without any annotated data or reward distillation). To this end, we propose a self-evolving framework, named EvoLMM, that instantiates two cooperative agents from a single backbone model: a Proposer, which generates diverse, image-grounded questions, and a Solver, which solves them through internal consistency, where learning proceeds through a continuous self-rewarding process. This dynamic feedback encourages both the generation of informative queries and the refinement of structured reasoning without relying on ground-truth or human judgments. When using the popular Qwen2.5-VL as the base model, our EvoLMM yields consistent gains upto $\\sim$3\\% on multimodal math-reasoning benchmarks, including ChartQA, MathVista, and MathVision, using only raw training images. We hope our simple yet effective approach will serve as a solid baseline easing future research in self-improving LMMs in a fully-unsupervised fashion. Our code and models are available at https://github.com/mbzuai-oryx/EvoLMM.","short_abstract":"Recent advances in large multimodal models (LMMs) have enabled impressive reasoning and perception abilities, yet most existing training pipelines still depend on human-curated data or externally verified reward models, limiting their autonomy and scalability. In this work, we strive to improve LMM reasoning capabiliti...","url_abs":"https://arxiv.org/abs/2511.16672","url_pdf":"https://arxiv.org/pdf/2511.16672v3","authors":"[\"Omkar Thawakar\",\"Shravan Venkatraman\",\"Ritesh Thawkar\",\"Abdelrahman Shaker\",\"Hisham Cholakkal\",\"Rao Muhammad Anwer\",\"Salman Khan\",\"Fahad Khan\"]","published":"2025-11-20T18:59:54Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":606860,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2839240,"paper_url":"https://arxiv.org/abs/2511.16672","paper_title":"EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards","repo_url":"https://github.com/mbzuai-oryx/EvoLMM","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
