{"ID":2894096,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.12558","arxiv_id":"2507.12558","title":"When Retriever Meets Generator: A Joint Model for Code Comment Generation","abstract":"Automatically generating concise, informative comments for source code can lighten documentation effort and accelerate program comprehension. Retrieval-augmented approaches first fetch code snippets with existing comments and then synthesize a new comment, yet retrieval and generation are typically optimized in isolation, allowing irrelevant neighbors topropagate noise downstream. To tackle the issue, we propose a novel approach named RAGSum with the aim of both effectiveness and efficiency in recommendations. RAGSum is built on top offuse retrieval and generation using a single CodeT5 backbone. We report preliminary results on a unified retrieval-generation framework built on CodeT5. A contrastive pre-training phase shapes code embeddings for nearest-neighbor search; these weights then seed end-to-end training with a composite loss that (i) rewards accurate top-k retrieval; and (ii) minimizes comment-generation error. More importantly, a lightweight self-refinement loop is deployed to polish the final output. We evaluated theframework on three cross-language benchmarks (Java, Python, C), and compared it with three well-established baselines. The results show that our approach substantially outperforms thebaselines with respect to BLEU, METEOR, and ROUTE-L. These findings indicate that tightly coupling retrieval and generationcan raise the ceiling for comment automation and motivateforthcoming replications and qualitative developer studies.","short_abstract":"Automatically generating concise, informative comments for source code can lighten documentation effort and accelerate program comprehension. Retrieval-augmented approaches first fetch code snippets with existing comments and then synthesize a new comment, yet retrieval and generation are typically optimized in isolati...","url_abs":"https://arxiv.org/abs/2507.12558","url_pdf":"https://arxiv.org/pdf/2507.12558v2","authors":"[\"Tien P. T. Le\",\"Anh M. T. Bui\",\"Huy N. D. Pham\",\"Alessio Bucaioni\",\"Phuong T. Nguyen\"]","published":"2025-07-16T18:12:27Z","proceeding":"cs.SE","tasks":"[\"cs.SE\"]","methods":"[]","has_code":false}
