{"ID":2823859,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.00138","arxiv_id":"2601.00138","title":"Explicit Abstention Knobs for Predictable Reliability in Video Question Answering","abstract":"High-stakes deployment of vision-language models (VLMs) requires selective prediction, where systems abstain when uncertain rather than risk costly errors. We investigate whether confidence-based abstention provides reliable control over error rates in video question answering, and whether that control remains robust under distribution shift. Using NExT-QA and Gemini 2.0 Flash, we establish two findings. First, confidence thresholding provides mechanistic control in-distribution. Sweeping threshold epsilon produces smooth risk-coverage tradeoffs, reducing error rates f","short_abstract":"High-stakes deployment of vision-language models (VLMs) requires selective prediction, where systems abstain when uncertain rather than risk costly errors. We investigate whether confidence-based abstention provides reliable control over error rates in video question answering, and whether that control remains robust u...","url_abs":"https://arxiv.org/abs/2601.00138","url_pdf":"https://arxiv.org/pdf/2601.00138v2","authors":"[\"Jorge Ortiz\"]","published":"2025-12-31T23:27:32Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}
