{"ID":2852863,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.21798","arxiv_id":"2510.21798","title":"An Evaluation of Hybrid Annotation Workflows on High-Ambiguity Spatiotemporal Video Footage","abstract":"Manual annotation remains the gold standard for high-quality, dense temporal video datasets, yet it is inherently time-consuming. Vision-language models can aid human annotators and expedite this process. We report on the impact of automatic Pre-Annotations from a tuned encoder on a Human-in-the-Loop labeling workflow for video footage. Quantitative analysis in a study of a single-iteration test involving 18 volunteers demonstrates that our workflow reduced annotation time by 35% for the majority (72%) of the participants. Beyond efficiency, we provide a rigorous framework for benchmarking AI-assisted workflows that quantifies trade-offs between algorithmic speed and the integrity of human verification.","short_abstract":"Manual annotation remains the gold standard for high-quality, dense temporal video datasets, yet it is inherently time-consuming. Vision-language models can aid human annotators and expedite this process. We report on the impact of automatic Pre-Annotations from a tuned encoder on a Human-in-the-Loop labeling workflow...","url_abs":"https://arxiv.org/abs/2510.21798","url_pdf":"https://arxiv.org/pdf/2510.21798v2","authors":"[\"Juan Gutiérrez\",\"Victor Gutiérrez\",\"Ángel Mora\",\"Silvia Rodriguez\",\"José Luis Blanco\"]","published":"2025-10-20T16:10:11Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.HC\",\"cs.LG\"]","methods":"[\"Language Model\"]","has_code":false}