{"ID":2870891,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.11865","arxiv_id":"2509.11865","title":"Tenma: Robust Cross-Embodiment Robot Manipulation with Diffusion Transformer","abstract":"Scaling Transformer policies and diffusion models has advanced robotic manipulation, yet combining these techniques in lightweight, cross-embodiment learning settings remains challenging. We study design choices that most affect stability and performance for diffusion-transformer policies trained on heterogeneous, multimodal robot data, and introduce Tenma, a lightweight diffusion-transformer for bi-manual arm control. Tenma integrates multiview RGB, proprioception, and language via a cross-embodiment normalizer that maps disparate state/action spaces into a shared latent space; a Joint State-Time encoder for temporally aligned observation learning with inference speed boosts; and a diffusion action decoder optimized for training stability and learning capacity. Across benchmarks and under matched compute, Tenma achieves an average success rate of 88.95% in-distribution and maintains strong performance under object and scene shifts, substantially exceeding baseline policies whose best in-distribution average is 18.12%. Despite using moderate data scale, Tenma delivers robust manipulation and generalization, indicating the great potential for multimodal and cross-embodiment learning strategies for further augmenting the capacity of transformer-based imitation learning policies.","short_abstract":"Scaling Transformer policies and diffusion models has advanced robotic manipulation, yet combining these techniques in lightweight, cross-embodiment learning settings remains challenging. We study design choices that most affect stability and performance for diffusion-transformer policies trained on heterogeneous, mult...","url_abs":"https://arxiv.org/abs/2509.11865","url_pdf":"https://arxiv.org/pdf/2509.11865v1","authors":"[\"Travis Davies\",\"Yiqi Huang\",\"Yunxin Liu\",\"Xiang Chen\",\"Huxian Liu\",\"Luhui Hu\"]","published":"2025-09-15T12:39:15Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.AI\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false}
