{"ID":2849628,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.24793","arxiv_id":"2510.24793","title":"SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications","abstract":"We present SwiftEmbed, a production-oriented serving system for static token embeddings that achieves 1.12\\,ms p50 latency for single-text requests while maintaining a 60.6 MTEB average score across 8 representative tasks. Built around the open-source Potion-base-8M distilled model from MinishLab and implemented in Rust, the system delivers 50,000 requests per second through static embedding lookup, mean pooling, and zero-copy IEEE754 binary serialization. Evaluation demonstrates exceptional duplicate detection performance (90.1% AP) and strong semantic similarity (76.1% Spearman correlation). Performance relative to Sentence-BERT is task-dependent: robust for deduplication and similarity workloads (89--100%), substantially lower for classification and complex retrieval tasks (75%). Domain-specific performance ranges from 75% to 131% of a GloVe-840B baseline. The system targets real-time embedding applications where sub-5\\,ms latency is operationally critical and where full transformer inference is not feasible.","short_abstract":"We present SwiftEmbed, a production-oriented serving system for static token embeddings that achieves 1.12\\,ms p50 latency for single-text requests while maintaining a 60.6 MTEB average score across 8 representative tasks. Built around the open-source Potion-base-8M distilled model from MinishLab and implemented in Rus...","url_abs":"https://arxiv.org/abs/2510.24793","url_pdf":"https://arxiv.org/pdf/2510.24793v3","authors":"[\"Edouard Lansiaux\",\"Antoine Simonet\",\"Eric Wiel\"]","published":"2025-10-27T13:40:26Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false}
