{"ID":2842868,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.09388","arxiv_id":"2511.09388","title":"Learning by Neighbor-Aware Semantics, Deciding by Open-form Flows: Towards Robust Zero-Shot Skeleton Action Recognition","abstract":"Recognizing unseen skeleton action categories remains highly challenging due to the absence of corresponding skeletal priors. Existing approaches generally follow an ``align-then-classify'' paradigm but face two fundamental issues, \\textit{i.e.}, (i) fragile point-to-point alignment arising from imperfect semantics, and (ii) rigid classifiers restricted by static decision boundaries and coarse-grained anchors. To address these issues, we propose a novel method for zero-shot skeleton action recognition, termed \\texttt{\\textbf{Flora}}, which builds upon \\textbf{F}lexib\\textbf{L}e neighb\\textbf{O}r-aware semantic attunement and open-form dist\\textbf{R}ibution-aware flow cl\\textbf{A}ssifier. Specifically, we flexibly attune textual semantics by incorporating neighboring inter-class contextual cues to form direction-aware regional semantics, coupled with a cross-modal geometric consistency objective that ensures stable and robust point-to-region alignment. Furthermore, we employ noise-free flow matching to bridge the modality distribution gap between semantic and skeleton latent embeddings, while a condition-free contrastive regularization enhances discriminability, leading to a distribution-aware classifier with fine-grained decision boundaries achieved through token-level velocity predictions. Extensive experiments on three benchmark datasets validate the effectiveness of our method, showing particularly impressive performance even when trained with only 10% of the seen data. Code is available at https://github.com/cseeyangchen/Flora.","short_abstract":"Recognizing unseen skeleton action categories remains highly challenging due to the absence of corresponding skeletal priors. Existing approaches generally follow an ``align-then-classify'' paradigm but face two fundamental issues, \\textit{i.e.}, (i) fragile point-to-point alignment arising from imperfect semantics, an...","url_abs":"https://arxiv.org/abs/2511.09388","url_pdf":"https://arxiv.org/pdf/2511.09388v2","authors":"[\"Yang Chen\",\"Miaoge Li\",\"Zhijie Rao\",\"Deze Zeng\",\"Song Guo\",\"Jingcai Guo\"]","published":"2025-11-12T14:54:53Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"LoRA\"]","has_code":false,"code_links":[{"ID":607166,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2842868,"paper_url":"https://arxiv.org/abs/2511.09388","paper_title":"Learning by Neighbor-Aware Semantics, Deciding by Open-form Flows: Towards Robust Zero-Shot Skeleton Action Recognition","repo_url":"https://github.com/cseeyangchen/Flora","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
