{"ID":2886556,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.03868","arxiv_id":"2508.03868","title":"Prediction-Oriented Subsampling from Data Streams","abstract":"Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method centred on reducing uncertainty in downstream predictions of interest. Empirically, we demonstrate that this prediction-oriented approach performs better than a previously proposed information-theoretic technique on two widely studied problems. At the same time, we highlight that reliably achieving strong performance in practice requires careful model design.","short_abstract":"Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method...","url_abs":"https://arxiv.org/abs/2508.03868","url_pdf":"https://arxiv.org/pdf/2508.03868v2","authors":"[\"Benedetta Lavinia Mussati\",\"Freddie Bickford Smith\",\"Tom Rainforth\",\"Stephen Roberts\"]","published":"2025-08-05T19:30:28Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}
