{"ID":2841177,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.12002","arxiv_id":"2511.12002","title":"Selecting Fine-Tuning Examples by Quizzing VLMs","abstract":"A challenge in fine-tuning text-to-image diffusion models for specific topics is to select good examples. Fine-tuning from image sets of varying quality, such as Wikipedia Commons, will often produce poor output. However, training images that \\textit{do} exemplify the target concept (e.g., a \\textit{female Mountain Bluebird}) help ensure that the generated images are similarly representative (e.g., have the prototypical blue-wings and gray chest). In this work, we propose QZLoRA, a framework to select images for low-rank adaptation (LoRA). The approach leverages QuizRank, a method to automatically rank images by treating them as an `educational intervention' and `quizzing' a VLM. We demonstrate that QZLoRA can produce better aligned, photorealistic images with fewer samples. We also show that these fine-tuned models can produce stylized that are similarly representative (i.e., illustrations). Our results highlight the promise of combining automated visual reasoning with parameter-efficient fine-tuning for topic-adaptive generative modeling.","short_abstract":"A challenge in fine-tuning text-to-image diffusion models for specific topics is to select good examples. Fine-tuning from image sets of varying quality, such as Wikipedia Commons, will often produce poor output. However, training images that \\textit{do} exemplify the target concept (e.g., a \\textit{female Mountain Blu...","url_abs":"https://arxiv.org/abs/2511.12002","url_pdf":"https://arxiv.org/pdf/2511.12002v1","authors":"[\"Tenghao Ji\",\"Eytan Adar\"]","published":"2025-11-15T02:48:48Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CV\"]","methods":"[\"Diffusion Model\",\"LoRA\"]","has_code":false}
