{"ID":3004779,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T11:43:53.432517148Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03724","arxiv_id":"2606.03724","title":"Same Weights, Different Robot: A Deployment Safety View of VLA Policies","abstract":"Vision-language-action (VLA) policies are often treated as checkpoint-defined objects: if the weights, prompt, and benchmark suite match, the deployment is assumed to be the same policy. Robot execution breaks this assumption because the same normalized model output can become a different physical action after action unnormalization and controller conventions are applied. This creates a deployment-safety gap: safety review can certify the checkpoint while missing the executable robot policy that reaches the controller. We formalize this gap as an executable policy specification problem: a VLA policy includes the learned model, action representation, metadata-selected unnormalizer, and controller-facing conventions. Under this view, identical checkpoints can be executable-inequivalent. For quantile-style action normalization, we derive a closed-form metadata mismatch transform and an ExecSpec certificate that measures action-space semantic drift without model inference or rollout. On LIBERO-Goal replay, substituting a plausible sibling metadata key yields mean drift 0.199 over six non-gripper action dimensions and reduces success from 28/28 to 2/28 under full substitution. On LIBERO-Spatial replay, the same substituted key reduces success from 26/26 to 0/26. The same full-substitution protocol gives 0/28 success for all four Object substitutions and 0/23 or 1/23 success on Long. Identity-key, replay-validity, no-op filtering, raw-vs-correct replay, mask/gripper, synthetic upper-bound, and OpenVLA-style unnormalizer interface checks rule out several simpler explanations. These results do not certify closed-loop or hardware safety. They support a narrower deployment-safety view: action-space metadata is part of the executable policy and should be checked before rollout.","short_abstract":"Vision-language-action (VLA) policies are often treated as checkpoint-defined objects: if the weights, prompt, and benchmark suite match, the deployment is assumed to be the same policy. Robot execution breaks this assumption because the same normalized model output can become a different physical action after action u...","url_abs":"https://arxiv.org/abs/2606.03724","url_pdf":"https://arxiv.org/pdf/2606.03724v1","authors":"[\"Jianwei Tai\"]","published":"2026-06-02T14:45:00Z","proceeding":"cs.CR","tasks":"[\"cs.CR\"]","methods":"[]","has_code":false}
