{"ID":2860045,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.05213","arxiv_id":"2510.05213","title":"VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing","abstract":"Pretrained vision foundation models (VFMs) advance robotic learning via rich visual representations, yet individual VFMs typically excel only in specific domains, limiting generality across tasks. Distilling multiple VFMs into a unified representation for policy can mitigate this limitation but often yields inflexible task-specific feature selection and requires costly full re-training to incorporate robot-domain knowledge. We propose VER, a Vision Expert transformer for Robot learning. During pretraining, VER distills multiple VFMs into a vision expert library. It then fine-tunes only a lightweight routing network (fewer than 0.4% of parameters) to dynamically select task-relevant experts from the pretrained library for downstream robot tasks. We further introduce Patchwise Expert Routing with Curriculum Top-K Annealing to improve both flexibility and precision of dynamic expert selection. Moreover, VER supports parameter-efficient finetuning for scalable expert utilization and adaptive robot-domain knowledge integration. Across 17 diverse robotic tasks and multiple policy heads, VER achieves state-of-the-art performance. We find that VER reduces large-norm outliers in task-irrelevant regions (e.g., background) and concentrates on task-critical regions. Visualizations and codes can be found in https://yixiaowang7.github.io/ver_page/.","short_abstract":"Pretrained vision foundation models (VFMs) advance robotic learning via rich visual representations, yet individual VFMs typically excel only in specific domains, limiting generality across tasks. Distilling multiple VFMs into a unified representation for policy can mitigate this limitation but often yields inflexible...","url_abs":"https://arxiv.org/abs/2510.05213","url_pdf":"https://arxiv.org/pdf/2510.05213v2","authors":"[\"Yixiao Wang\",\"Mingxiao Huo\",\"Zhixuan Liang\",\"Yushi Du\",\"Lingfeng Sun\",\"Haotian Lin\",\"Jinghuan Shang\",\"Chensheng Peng\",\"Mohit Bansal\",\"Mingyu Ding\",\"Masayoshi Tomizuka\"]","published":"2025-10-06T18:00:43Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Transformer\"]","has_code":false}
