{"ID":3005089,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T07:50:16.0004273Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03345","arxiv_id":"2606.03345","title":"Beyond Semantics: Modeling Factual and Affective Perceptual Experiences from Vision-Language Data","abstract":"We present P-Topics (Perception Topics) modeling, a novel problem for understanding how images are perceived affectively and across cultures. The goal is to (1) discover and model the different perception experiences in a dataset of images and captions, where each experience is defined by an objective factual and a subjective affective aspect, and (2) associate images to their relevant perception experiences. We introduce **PercepT** (**Percep**tion topic **T**ransformer), a two-stage architecture that tackles P-Topics modeling. In the formation stage, percepT discovers *P-Topics* as visual-textual clusters using an unsupervised training objective, and dynamically selects the number of clusters to match the perceptual richness of the dataset. In the mapping stage, it learns *P-Topic mapping functions* via attention pooling to associate images to their respective clusters. On ArtELingo, PercepT achieves a silhouette score of **0.97** compared to **0.37** from the closest baseline reflecting better perceptual clusters. PercepT also achieves an AUC score of **0.94** compared to **0.77** showing better mapping to perceptual clusters. Human evaluation confirms that PercepT captures semantically meaningful perception experiences and significantly outperforms existing methods. Our implementation will be made public.","short_abstract":"We present P-Topics (Perception Topics) modeling, a novel problem for understanding how images are perceived affectively and across cultures. The goal is to (1) discover and model the different perception experiences in a dataset of images and captions, where each experience is defined by an objective factual and a sub...","url_abs":"https://arxiv.org/abs/2606.03345","url_pdf":"https://arxiv.org/pdf/2606.03345v1","authors":"[\"Youssef Mohamed\",\"Kenneth Ward Church\",\"Mohamed Elhoseiny\"]","published":"2026-06-02T08:54:59Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.CL\",\"cs.CY\"]","methods":"[]","has_code":false}
