{"ID":2883637,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.07781","arxiv_id":"2508.07781","title":"SASST: Leveraging Syntax-Aware Chunking and LLMs for Simultaneous Speech Translation","abstract":"This work proposes a grammar-based chunking strategy that segments input streams into semantically complete units by parsing dependency relations (e.g., noun phrase boundaries, verb-object structures) and punctuation features. The method ensures chunk coherence and minimizes semantic fragmentation. Building on this mechanism, we present SASST (Syntax-Aware Simultaneous Speech Translation), an end-to-end framework integrating frozen Whisper encoder and decoder-only LLM. The unified architecture dynamically outputs translation tokens or \u003cWAIT\u003e symbols to jointly optimize translation timing and content, with target-side reordering addressing word-order divergence. Experiments on CoVoST2 multilingual corpus En-{De, Zh, Ja} demonstrate significant translation quality improvements across languages and validate the effectiveness of syntactic structures in LLM-driven SimulST systems.","short_abstract":"This work proposes a grammar-based chunking strategy that segments input streams into semantically complete units by parsing dependency relations (e.g., noun phrase boundaries, verb-object structures) and punctuation features. The method ensures chunk coherence and minimizes semantic fragmentation. Building on this mec...","url_abs":"https://arxiv.org/abs/2508.07781","url_pdf":"https://arxiv.org/pdf/2508.07781v1","authors":"[\"Zeyu Yang\",\"Lai Wei\",\"Roman Koshkin\",\"Xi Chen\",\"Satoshi Nakamura\"]","published":"2025-08-11T09:13:35Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\"]","has_code":false}
