{"ID":2880099,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.14410","arxiv_id":"2508.14410","title":"ORThought: Benchmarking and Automating Logistics Optimization Modeling","abstract":"Optimization modeling stands as the engine of scientific decision-making in logistics and transportation, yet its adoption is hindered by a steep expertise threshold and the latency of manual workflows. Automating this process via Large Language Models (LLMs) offers a potential solution, but current approaches face critical bottlenecks: (i) a lack of high-quality, complex benchmarks; (ii) methodological inefficiencies in autonomous multi-agent frameworks, which often exhibit instability and redundant computation; and (iii) evaluations that lack diagnostic depth. In this work, we address these challenges from the following three aspects. First, we introduce LogiOR, a diverse logistics benchmark with rigorous annotations, and enrich existing datasets with the same annotation standard to support community utilization. Second, we propose ORThought, a structured dual-agent framework. By incorporating expert-level modeling principles via chain-of-thought reasoning, ORThought eliminates the redundancy of uncontrolled autonomous agents. Third, extensive empirical evaluations demonstrate that ORThought consistently outperforms state-of-the-art baselines by 9-17 percentage points, exhibiting distinct advantages in handling complex constraints while maintaining high token efficiency. Building on these results, we further conduct a multidimensional error analysis, which identifies key failure modes and success factors, providing actionable insights for future research. The dataset and code are available at https://huggingface.co/datasets/LabMem012/LogiOR and https://github.com/ZJU-TSELab/ORThought, respectively.","short_abstract":"Optimization modeling stands as the engine of scientific decision-making in logistics and transportation, yet its adoption is hindered by a steep expertise threshold and the latency of manual workflows. Automating this process via Large Language Models (LLMs) offers a potential solution, but current approaches face cri...","url_abs":"https://arxiv.org/abs/2508.14410","url_pdf":"https://arxiv.org/pdf/2508.14410v3","authors":"[\"Beinuo Yang\",\"Qishen Zhou\",\"Junyi Li\",\"Chenxing Su\",\"Panagiotis Angeloudis\",\"Simon Hu\"]","published":"2025-08-20T04:14:54Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":610652,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880099,"paper_url":"https://arxiv.org/abs/2508.14410","paper_title":"ORThought: Benchmarking and Automating Logistics Optimization Modeling","repo_url":"https://github.com/ZJU-TSELab/ORThought","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}