{"ID":2898852,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.02800","arxiv_id":"2507.02800","title":"Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding","abstract":"Speech neuroprostheses aim to restore communication for people with severe paralysis by decoding speech directly from neural activity. To accelerate algorithmic progress, a recent benchmark released intracranial recordings from a paralyzed participant attempting to speak, along with a baseline decoding algorithm. Prior work on the benchmark showed impressive accuracy gains. However, these gains increased computational costs and were not demonstrated in a real-time decoding setting. Here, we make three contributions that pave the way towards accurate, efficient, and real-time neural speech decoding. First, we incorporate large amounts of time-masking during training. On average, over $50\\%$ of each trial is masked. Second, we replace the gated recurrent unit (GRU) architecture used in the baseline algorithm with a compact Transformer. The Transformer architecture uses $83\\%$ fewer parameters, cuts peak GPU memory usage by $52\\%$, and is significantly faster to calibrate relative to the GRU. Third, we design a lightweight variant of an existing test-time adaptation method developed for decoding handwriting from neural activity. Our variant adapts the model using multiple time-masked augmentations of a single trial and requires only one gradient step per trial. Together, these contributions reduce word error rate by over $20\\%$ and effectively mitigate performance degradations across held-out days in a real-time decoding setting while substantially lowering computational costs.","short_abstract":"Speech neuroprostheses aim to restore communication for people with severe paralysis by decoding speech directly from neural activity. To accelerate algorithmic progress, a recent benchmark released intracranial recordings from a paralyzed participant attempting to speak, along with a baseline decoding algorithm. Prior...","url_abs":"https://arxiv.org/abs/2507.02800","url_pdf":"https://arxiv.org/pdf/2507.02800v2","authors":"[\"Ebrahim Feghhi\",\"Shreyas Kaasyap\",\"Nima Hadidi\",\"Jonathan C. Kao\"]","published":"2025-07-03T17:02:54Z","proceeding":"cs.HC","tasks":"[\"cs.HC\"]","methods":"[\"Transformer\"]","has_code":false}
