{"ID":2831886,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.08094","arxiv_id":"2512.08094","title":"Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing","abstract":"The goal of this work is to develop a universal approach for aligning subtitles (i.e., spoken language text with corresponding timestamps) to continuous sign language videos. Prior approaches typically rely on end-to-end training tied to a specific language or dataset, which limits their generality. In contrast, our method Segment, Embed, and Align (SEA) provides a single framework that works across multiple languages and domains. SEA leverages two pretrained models: the first to segment a video frame sequence into individual signs and the second to embed the video clip of each sign into a shared latent space with text. Alignment is subsequently performed with a lightweight dynamic programming procedure that runs efficiently on CPUs within a minute, even for hour-long episodes. SEA is flexible and can adapt to a wide range of scenarios, utilizing resources from small lexicons to large continuous corpora. Experiments on four sign language datasets demonstrate state-of-the-art alignment performance, highlighting the potential of SEA to generate high-quality parallel data for advancing sign language processing. SEA's code and models are openly available.","short_abstract":"The goal of this work is to develop a universal approach for aligning subtitles (i.e., spoken language text with corresponding timestamps) to continuous sign language videos. Prior approaches typically rely on end-to-end training tied to a specific language or dataset, which limits their generality. In contrast, our me...","url_abs":"https://arxiv.org/abs/2512.08094","url_pdf":"https://arxiv.org/pdf/2512.08094v1","authors":"[\"Zifan Jiang\",\"Youngjoon Jang\",\"Liliane Momeni\",\"Gül Varol\",\"Sarah Ebling\",\"Andrew Zisserman\"]","published":"2025-12-08T23:07:48Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
