{"ID":3005093,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T07:50:16.0004273Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03354","arxiv_id":"2606.03354","title":"ImageAuditor: Membership Inference Attack against Image-based Retrieval-Augmented Generation","abstract":"Image-based Retrieval-Augmented Generation (IRAG) conditions a frozen generator on reference images retrieved from an external database, supporting both text-to-image (T2I) and question answering (Q\u0026A) tasks. Because these databases are opaque and web-scraped, copyright holders need ways to audit whether specific images appear in them. While prior work employs membership inference attacks (MIAs) to audit uni-modal, text-based RAG, they fail to transfer to IRAG due to two key challenges. First, cross-modal retrieval: text-RAG MIAs force retrieval of the target passage by injecting its content into the query, which is unavailable in IRAG since images cannot be embedded into text queries; even accurate image captions fail to bridge the modality gap. Second, discriminative signal extraction: text-RAG MIAs extract membership signals by prompting the generator to answer multiple questions over the target passage, whereas T2I generators in IRAG produce images rather than follow Q\u0026A commands. To fill this gap, we introduce the first MIA tailored to IRAG, ImageAuditor, which decomposes each attack query into a retrieval segment and an extraction segment, enabling dedicated optimization for each challenge. For retrieval, we propose Reward-Guided Policy Optimization (RGPO), which updates a stochastic policy from reward-ranked candidates to navigate the cross-modal embedding landscape and admits finite-sample optimality guarantees to balance exploration and exploitation. For extraction, we analyze the distribution of the MIA score to guide the co-design of the prompting strategy and scoring rule, and derive task-specific instantiations for T2I and Q\u0026A tasks. We aggregate signals across queries via K-means clustering for reliable membership decisions. Across various IRAG systems, ImageAuditor exceeds 80% AUROC with only four queries per audited image and remains robust across diverse settings.","short_abstract":"Image-based Retrieval-Augmented Generation (IRAG) conditions a frozen generator on reference images retrieved from an external database, supporting both text-to-image (T2I) and question answering (Q\u0026A) tasks. Because these databases are opaque and web-scraped, copyright holders need ways to audit whether specific image...","url_abs":"https://arxiv.org/abs/2606.03354","url_pdf":"https://arxiv.org/pdf/2606.03354v1","authors":"[\"Jinghuai Zhang\",\"Pengyue Yu\",\"Zhexiao Lin\",\"Kunlin Cai\",\"Fnu Suya\",\"Yuan Tian\"]","published":"2026-06-02T09:03:56Z","proceeding":"cs.CR","tasks":"[\"cs.CR\"]","methods":"[\"RAG\",\"LoRA\"]","has_code":false}