{"ID":2854931,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.13068","arxiv_id":"2510.13068","title":"NeuroRVQ: Multi-Scale Biosignal Tokenization for Generative Foundation Models","abstract":"Biosignals such as electroencephalography (EEG), electrocardiography (ECG), and electromyography (EMG) encode physiological activity across multiple temporal and spectral scales, yielding representations that are rich but challenging for machine learning. Foundation models trained to predict masked signal tokens have shown promise in learning generalizable biosignal representations, yet their performance depends on the tokenizer's ability to preserve high-frequency dynamics and reconstruct signals with high fidelity. We introduce NeuroRVQ, a modality-adaptive biosignal tokenizer family designed for high-fidelity signal reconstruction. To capture the full frequency spectrum, NeuroRVQ decomposes biosignals into frequency-specific representations via multi-scale temporal convolutions, each encoded into hierarchical RVQ codebooks to preserve high-frequency detail, combined with a novel phase-aware training loss that respects the circular topology of Fourier phase. By tuning the temporal resolution, number and size of temporal kernels and RVQ depth, this design adapts to the spectro-temporal characteristics of each biosignal modality. To validate that tokenizer quality drives downstream performance, we train a simple masked-token foundation model for each modality (NeuroRVQ-FM) using the corresponding NeuroRVQ tokenizer. The NeuroRVQ-FM family achieves competitive or superior downstream performance compared to existing modality-specific foundation models, demonstrating that high-fidelity tokenization is a critical factor for effective biosignal modeling.","short_abstract":"Biosignals such as electroencephalography (EEG), electrocardiography (ECG), and electromyography (EMG) encode physiological activity across multiple temporal and spectral scales, yielding representations that are rich but challenging for machine learning. Foundation models trained to predict masked signal tokens have s...","url_abs":"https://arxiv.org/abs/2510.13068","url_pdf":"https://arxiv.org/pdf/2510.13068v4","authors":"[\"Konstantinos Barmpas\",\"Na Lee\",\"Dimitrios Chalatsis\",\"William Raftery\",\"Yannis Panagakis\",\"Dimitrios A. Adamos\",\"Nikolaos Laskaris\",\"Alexandros Koliousis\",\"Dario Farina\",\"Stefanos Zafeiriou\"]","published":"2025-10-15T01:26:52Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.HC\"]","methods":"[]","has_code":false}