{"ID":2839897,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.14210","arxiv_id":"2511.14210","title":"Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution","abstract":"We introduce Orion, a visual agent that integrates vision-based reasoning with tool-augmented execution to achieve powerful, precise, multi-step visual intelligence across images, video, and documents. Unlike traditional vision-language models that generate descriptive outputs, Orion orchestrates a suite of specialized computer vision tools, including object detection, keypoint localization, panoptic segmentation, Optical Character Recognition (OCR), and geometric analysis, to execute complex multi-step visual workflows. The system achieves competitive performance across MMMU, MMBench, DocVQA, and MMLongBench while extending monolithic VLM capabilities to production-grade visual intelligence. Through its agentic, tool-augmented approach, Orion enables autonomous visual reasoning that bridges neural perception with symbolic execution, marking the transition from passive visual understanding to active, tool-driven visual intelligence. Try Orion for free at: https://chat.vlm.run Learn more at: https://www.vlm.run/orion","short_abstract":"We introduce Orion, a visual agent that integrates vision-based reasoning with tool-augmented execution to achieve powerful, precise, multi-step visual intelligence across images, video, and documents. Unlike traditional vision-language models that generate descriptive outputs, Orion orchestrates a suite of specialized...","url_abs":"https://arxiv.org/abs/2511.14210","url_pdf":"https://arxiv.org/pdf/2511.14210v2","authors":"[\"N Dinesh Reddy\",\"Dylan Snyder\",\"Lona Kiragu\",\"Mirajul Mohin\",\"Shahrear Bin Amin\",\"Sudeep Pillai\"]","published":"2025-11-18T07:41:02Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Language Model\"]","project_urls":"[\"https://chat.vlm.run\",\"https://www.vlm.run/orion\"]","has_code":false,"code_links":[{"ID":606931,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2839897,"paper_url":"https://arxiv.org/abs/2511.14210","paper_title":"Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution","repo_url":"https://github.com/vlm-run/vlmrun-cookbook","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
