{"ID":2851764,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.19644","arxiv_id":"2510.19644","title":"CoRoVA: Compressed Representations for Vector-Augmented Code Completion","abstract":"Retrieval-augmented generation has emerged as one of the most effective approaches for code completion enhancement, especially when repository-level context is important. However, adding this extra retrieved context significantly increases sequence length, raises prefill cost, and degrades time-to-first-token (TTFT), which slows down inference -- a critical limitation for interactive settings such as IDEs. In this work, we introduce CoRoVA, a framework that compresses context into compact, semantically rich representations that remain interpretable to code LLMs. This improves generation quality while reducing prompt augmentation to only a few compressed single-token vectors. Our approach requires training only a small projector module and introduces negligible additional latency, yet it significantly improves the prediction quality of code LLMs. Our experiments show that CoRoVA enables a 20-38\\% reduction in TTFT on completion tasks compared to uncompressed RAG.","short_abstract":"Retrieval-augmented generation has emerged as one of the most effective approaches for code completion enhancement, especially when repository-level context is important. However, adding this extra retrieved context significantly increases sequence length, raises prefill cost, and degrades time-to-first-token (TTFT), w...","url_abs":"https://arxiv.org/abs/2510.19644","url_pdf":"https://arxiv.org/pdf/2510.19644v2","authors":"[\"Daria Cherniuk\",\"Nikita Sukhorukov\",\"Danil Gusak\",\"Nikita Sushko\",\"Danil Sivtsov\",\"Elena Tutubalina\",\"Evgeny Frolov\"]","published":"2025-10-22T14:49:21Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"RAG\",\"Large Language Model\"]","has_code":false}
