On the Use of Large Language Models for Qualitative Synthesis
Abstract
Large language models (LLMs) show promise for supporting systematic reviews (SR), even complex tasks such as qualitative synthesis (QS). However, applying them to a stage that is unevenly reported and variably conducted carries important risks: misuse can amplify existing weaknesses and erode confidence in the SR findings. To examine the challenges of using LLMs for QS, we conducted a collaborative autoethnography involving two trials. We evaluated each trial for methodological rigor and practical usefulness, and interpreted the results through a technical lens informed by how LLMs are built and their current limitations.