{"ID":2896193,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.07860","arxiv_id":"2507.07860","title":"THUNDER: Tile-level Histopathology image UNDERstanding benchmark","abstract":"Progress in a research field can be hard to assess, in particular when many concurrent methods are proposed in a short period of time. This is the case in digital pathology, where many foundation models have been released recently to serve as feature extractors for tile-level images, being used in a variety of downstream tasks, both for tile- and slide-level problems. Benchmarking available methods then becomes paramount to get a clearer view of the research landscape. In particular, in critical domains such as healthcare, a benchmark should not only focus on evaluating downstream performance, but also provide insights about the main differences between methods, and importantly, further consider uncertainty and robustness to ensure a reliable usage of proposed models. For these reasons, we introduce THUNDER, a tile-level benchmark for digital pathology foundation models, allowing for efficient comparison of many models on diverse datasets with a series of downstream tasks, studying their feature spaces and assessing the robustness and uncertainty of predictions informed by their embeddings. THUNDER is a fast, easy-to-use, dynamic benchmark that can already support a large variety of state-of-the-art foundation, as well as local user-defined models for direct tile-based comparison. In this paper, we provide a comprehensive comparison of 23 foundation models on 16 different datasets covering diverse tasks, feature analysis, and robustness. The code for THUNDER is publicly available at https://github.com/MICS-Lab/thunder.","short_abstract":"Progress in a research field can be hard to assess, in particular when many concurrent methods are proposed in a short period of time. This is the case in digital pathology, where many foundation models have been released recently to serve as feature extractors for tile-level images, being used in a variety of downstre...","url_abs":"https://arxiv.org/abs/2507.07860","url_pdf":"https://arxiv.org/pdf/2507.07860v3","authors":"[\"Pierre Marza\",\"Leo Fillioux\",\"Sofiène Boutaj\",\"Kunal Mahatha\",\"Christian Desrosiers\",\"Pablo Piantanida\",\"Jose Dolz\",\"Stergios Christodoulidis\",\"Maria Vakalopoulou\"]","published":"2025-07-10T15:41:35Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":612249,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2896193,"paper_url":"https://arxiv.org/abs/2507.07860","paper_title":"THUNDER: Tile-level Histopathology image UNDERstanding benchmark","repo_url":"https://github.com/MICS-Lab/thunder","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}