{"ID":2867060,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18843","arxiv_id":"2509.18843","title":"Are Smaller Open-Weight LLMs Closing the Gap to Proprietary Models for Biomedical Question Answering?","abstract":"Open-weight versions of large language models (LLMs) are rapidly advancing, with state-of-the-art models like DeepSeek-V3 now performing comparably to proprietary LLMs. This progression raises the question of whether small open-weight LLMs are capable of effectively replacing larger closed-source models. We are particularly interested in the context of biomedical question-answering, a domain we explored by participating in Task 13B Phase B of the BioASQ challenge. In this work, we compare several open-weight models against top-performing systems such as GPT-4o, GPT-4.1, Claude 3.5 Sonnet, and Claude 3.7 Sonnet. To enhance question answering capabilities, we use various techniques including retrieving the most relevant snippets based on embedding distance, in-context learning, and structured outputs. For certain submissions, we utilize ensemble approaches to leverage the diverse outputs generated by different models for exact-answer questions. Our results demonstrate that open-weight LLMs are comparable to proprietary ones. In some instances, open-weight LLMs even surpassed their closed counterparts, particularly when ensembling strategies were applied. All code is publicly available at https://github.com/evidenceprime/BioASQ-13b.","short_abstract":"Open-weight versions of large language models (LLMs) are rapidly advancing, with state-of-the-art models like DeepSeek-V3 now performing comparably to proprietary LLMs. This progression raises the question of whether small open-weight LLMs are capable of effectively replacing larger closed-source models. We are particu...","url_abs":"https://arxiv.org/abs/2509.18843","url_pdf":"https://arxiv.org/pdf/2509.18843v1","authors":"[\"Damian Stachura\",\"Joanna Konieczna\",\"Artur Nowak\"]","published":"2025-09-23T09:27:57Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.IR\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":609434,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2867060,"paper_url":"https://arxiv.org/abs/2509.18843","paper_title":"Are Smaller Open-Weight LLMs Closing the Gap to Proprietary Models for Biomedical Question Answering?","repo_url":"https://github.com/evidenceprime/BioASQ-13b","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
