{"ID":2880992,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.12668","arxiv_id":"2508.12668","title":"WP-CLIP: Leveraging CLIP to Predict Wölfflin's Principles in Visual Art","abstract":"Wölfflin's five principles offer a structured approach to analyzing stylistic variations for formal analysis. However, no existing metric effectively predicts all five principles in visual art. Computationally evaluating the visual aspects of a painting requires a metric that can interpret key elements such as color, composition, and thematic choices. Recent advancements in vision-language models (VLMs) have demonstrated their ability to evaluate abstract image attributes, making them promising candidates for this task. In this work, we investigate whether CLIP, pre-trained on large-scale data, can understand and predict Wölfflin's principles. Our findings indicate that it does not inherently capture such nuanced stylistic elements. To address this, we fine-tune CLIP on annotated datasets of real art images to predict a score for each principle. We evaluate our model, WP-CLIP, on GAN-generated paintings and the Pandora-18K art dataset, demonstrating its ability to generalize across diverse artistic styles. Our results highlight the potential of VLMs for automated art analysis.","short_abstract":"Wölfflin's five principles offer a structured approach to analyzing stylistic variations for formal analysis. However, no existing metric effectively predicts all five principles in visual art. Computationally evaluating the visual aspects of a painting requires a metric that can interpret key elements such as color, c...","url_abs":"https://arxiv.org/abs/2508.12668","url_pdf":"https://arxiv.org/pdf/2508.12668v1","authors":"[\"Abhijay Ghildyal\",\"Li-Yun Wang\",\"Feng Liu\"]","published":"2025-08-18T07:00:52Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\",\"Generative Adversarial Network\"]","has_code":false}
