{"ID":2846460,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.01169","arxiv_id":"2511.01169","title":"Web-Scale Collection of Video Data for 4D Animal Reconstruction","abstract":"Computer vision for animals holds great promise for wildlife research but often depends on large-scale data, while existing collection methods rely on controlled capture setups. Recent data-driven approaches show the potential of single-view, non-invasive analysis, yet current animal video datasets are limited--offering as few as 2.4K 15-frame clips and lacking key processing for animal-centric 3D/4D tasks. We introduce an automated pipeline that mines YouTube videos and processes them into object-centric clips, along with auxiliary annotations valuable for downstream tasks like pose estimation, tracking, and 3D/4D reconstruction. Using this pipeline, we amass 30K videos (2M frames)--an order of magnitude more than prior works. To demonstrate its utility, we focus on the 4D quadruped animal reconstruction task. To support this task, we present Animal-in-Motion (AiM), a benchmark of 230 manually filtered sequences with 11K frames showcasing clean, diverse animal motions. We evaluate state-of-the-art model-based and model-free methods on Animal-in-Motion, finding that 2D metrics favor the former despite unrealistic 3D shapes, while the latter yields more natural reconstructions but scores lower--revealing a gap in current evaluation. To address this, we enhance a recent model-free approach with sequence-level optimization, establishing the first 4D animal reconstruction baseline. Together, our pipeline, benchmark, and baseline aim to advance large-scale, markerless 4D animal reconstruction and related tasks from in-the-wild videos. Code and datasets are available at https://github.com/briannlongzhao/Animal-in-Motion.","short_abstract":"Computer vision for animals holds great promise for wildlife research but often depends on large-scale data, while existing collection methods rely on controlled capture setups. Recent data-driven approaches show the potential of single-view, non-invasive analysis, yet current animal video datasets are limited--offerin...","url_abs":"https://arxiv.org/abs/2511.01169","url_pdf":"https://arxiv.org/pdf/2511.01169v1","authors":"[\"Brian Nlong Zhao\",\"Jiajun Wu\",\"Shangzhe Wu\"]","published":"2025-11-03T02:40:06Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":607434,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2846460,"paper_url":"https://arxiv.org/abs/2511.01169","paper_title":"Web-Scale Collection of Video Data for 4D Animal Reconstruction","repo_url":"https://github.com/briannlongzhao/Animal-in-Motion","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
