{"ID":2839658,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.15622","arxiv_id":"2511.15622","title":"The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification","abstract":"Automated video analysis is critical for wildlife conservation. A foundational task in this domain is multi-animal tracking (MAT), which underpins applications such as individual re-identification and behavior recognition. However, existing datasets are limited in scale, constrained to a few species, or lack sufficient temporal and geographical diversity - leaving no suitable benchmark for training general-purpose MAT models applicable across wild animal populations. To address this, we introduce SA-FARI, the largest open-source MAT dataset for wild animals. It comprises 11,609 camera trap videos collected over approximately 10 years (2014-2024) from 741 locations across 4 continents, spanning 99 species categories. Each video is exhaustively annotated culminating in ~46 hours of densely annotated footage containing 16,224 masklet identities and 942,702 individual bounding boxes, segmentation masks, and species labels. Alongside the task-specific annotations, we publish anonymized camera trap locations for each video. Finally, we present comprehensive benchmarks on SA-FARI using state-of-the-art vision-language models for detection and tracking, including SAM 3, evaluated with both species-specific and generic animal prompts. We also compare against vision-only methods developed specifically for wildlife analysis. SA-FARI is the first large-scale dataset to combine high species diversity, multi-region coverage, and high-quality spatio-temporal annotations, offering a new foundation for advancing generalizable multianimal tracking in the wild. The dataset is available at https://www.conservationxlabs.com/sa-fari.","short_abstract":"Automated video analysis is critical for wildlife conservation. A foundational task in this domain is multi-animal tracking (MAT), which underpins applications such as individual re-identification and behavior recognition. However, existing datasets are limited in scale, constrained to a few species, or lack sufficient...","url_abs":"https://arxiv.org/abs/2511.15622","url_pdf":"https://arxiv.org/pdf/2511.15622v2","authors":"[\"Dante Francisco Wasmuht\",\"Otto Brookes\",\"Maximillian Schall\",\"Pablo Palencia\",\"Chris Beirne\",\"Tilo Burghardt\",\"Majid Mirmehdi\",\"Hjalmar Kühl\",\"Mimi Arandjelovic\",\"Sam Pottie\",\"Peter Bermant\",\"Brandon Asheim\",\"Yi Jin Toh\",\"Adam Elzinga\",\"Jason Holmberg\",\"Andrew Whitworth\",\"Eleanor Flatt\",\"Laura Gustafson\",\"Chaitanya Ryali\",\"Yuan-Ting Hu\",\"Baishan Guo\",\"Andrew Westbury\",\"Kate Saenko\",\"Didac Suris\"]","published":"2025-11-19T17:07:08Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Language Model\"]","project_urls":"[\"https://www.conservationxlabs.com/sa-fari\"]","has_code":false}
