{"ID":2851713,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.19559","arxiv_id":"2510.19559","title":"A Matter of Time: Revealing the Structure of Time in Vision-Language Models","abstract":"Large-scale vision-language models (VLMs) such as CLIP have gained popularity for their generalizable and expressive multimodal representations. By leveraging large-scale training data with diverse textual metadata, VLMs acquire open-vocabulary capabilities, solving tasks beyond their training scope. This paper investigates the temporal awareness of VLMs, assessing their ability to position visual content in time. We introduce TIME10k, a benchmark dataset of over 10,000 images with temporal ground truth, and evaluate the time-awareness of 37 VLMs by a novel methodology. Our investigation reveals that temporal information is structured along a low-dimensional, non-linear manifold in the VLM embedding space. Based on this insight, we propose methods to derive an explicit ``timeline'' representation from the embedding space. These representations model time and its chronological progression and thereby facilitate temporal reasoning tasks. Our timeline approaches achieve competitive to superior accuracy compared to a prompt-based baseline while being computationally efficient. All code and data are available at https://tekayanidham.github.io/timeline-page/.","short_abstract":"Large-scale vision-language models (VLMs) such as CLIP have gained popularity for their generalizable and expressive multimodal representations. By leveraging large-scale training data with diverse textual metadata, VLMs acquire open-vocabulary capabilities, solving tasks beyond their training scope. This paper investi...","url_abs":"https://arxiv.org/abs/2510.19559","url_pdf":"https://arxiv.org/pdf/2510.19559v1","authors":"[\"Nidham Tekaya\",\"Manuela Waldner\",\"Matthias Zeppelzauer\"]","published":"2025-10-22T13:14:02Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.IR\",\"cs.MM\"]","methods":"[\"Language Model\"]","has_code":false}
