{"ID":3049977,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T14:55:45.850535373Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04986","arxiv_id":"2606.04986","title":"Food-R1: A Unified Multi-Task Food Vision-Language Model with Reinforcement Learning","abstract":"Recent studies have explored Vision-Language Models (VLMs) for food analysis. However, most existing methods rely primarily on supervised fine-tuning (SFT), which often limits reasoning and generalization capabilities. Moreover, high-quality large-scale nutritional annotations remain scarce. To address these issues, we introduce CalorieBench-80K, a large-scale benchmark with curated calorie labels and dietary advice annotations. To the best of our knowledge, it is the first food image benchmark to incorporate Chain-of-Thought (CoT) annotations for calorie reasoning. We also propose Food-R1, a unified food VLM trained in a multi-task learning paradigm to equip the model with broad capabilities. Food-R1 undergoes CoT-based cold-start instruction tuning, followed by reinforcement fine-tuning (RFT) using Group Relative Policy Optimization (GRPO) to improve reasoning and performance. Experiments on CalorieBench-80K and representative benchmarks show that Food-R1 consistently outperforms strong baselines across food-related tasks. The code, model weights, and benchmark annotations are available at the project repository.","short_abstract":"Recent studies have explored Vision-Language Models (VLMs) for food analysis. However, most existing methods rely primarily on supervised fine-tuning (SFT), which often limits reasoning and generalization capabilities. Moreover, high-quality large-scale nutritional annotations remain scarce. To address these issues, we...","url_abs":"https://arxiv.org/abs/2606.04986","url_pdf":"https://arxiv.org/pdf/2606.04986v1","authors":"[\"Yu Zhu\",\"Yongkang Li\",\"Wenjie Zhu\",\"Haoyi Jiang\",\"Wenyu Liu\",\"Wei Yang\",\"Bin Li\",\"Xinggang Wang\"]","published":"2026-06-03T15:07:12Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false}
