{"ID":2874388,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.05513","arxiv_id":"2509.05513","title":"OpenEgo: A Large-Scale Multimodal Egocentric Dataset for Dexterous Manipulation","abstract":"Egocentric human videos provide scalable demonstrations for imitation learning, but existing corpora often lack either fine-grained, temporally localized action descriptions or dexterous hand annotations. We introduce OpenEgo, a multimodal egocentric manipulation dataset with standardized hand-pose annotations and intention-aligned action primitives. OpenEgo totals 1107 hours across six public datasets, covering 290 manipulation tasks in 600+ environments. We unify hand-pose layouts and provide descriptive, timestamped action primitives. To validate its utility, we train language-conditioned imitation-learning policies to predict dexterous hand trajectories. OpenEgo is designed to lower the barrier to learning dexterous manipulation from egocentric video and to support reproducible research in vision-language-action learning. All resources and instructions will be released at www.openegocentric.com.","short_abstract":"Egocentric human videos provide scalable demonstrations for imitation learning, but existing corpora often lack either fine-grained, temporally localized action descriptions or dexterous hand annotations. We introduce OpenEgo, a multimodal egocentric manipulation dataset with standardized hand-pose annotations and inte...","url_abs":"https://arxiv.org/abs/2509.05513","url_pdf":"https://arxiv.org/pdf/2509.05513v1","authors":"[\"Ahad Jawaid\",\"Yu Xiang\"]","published":"2025-09-05T21:47:55Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.RO\"]","methods":"[]","has_code":false}