{"ID":2923610,"CreatedAt":"2026-06-02T04:05:25.881865328Z","UpdatedAt":"2026-06-04T13:12:39.622923895Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.02345","arxiv_id":"2606.02345","title":"Doing well with less! On Sampling Techniques for Empirical Pairwise Loss Estimation/Minimization","abstract":"Many machine learning problems, including similarity learning, ranking, and clustering, rely on empirical pairwise loss functions whose quadratic computational cost quickly becomes prohibitive at scale. We demonstrate how a frugal approach that retains only a fraction of the available information on pairs can achieve estimation or optimization performance comparable to that obtained by using all pairs, by leveraging survey sampling techniques. A central finding, supported by both theory and experiments, is that such sampling plans must target pairs directly rather than individual observations. In particular, for pairwise losses between high-dimensional vectors such as embeddings in vision or graph learning, assigning higher inclusion probabilities to informative pairs using suitable auxiliary information yields performance close to full pairwise evaluation, providing a principled and theoretically grounded trade-off between accuracy and computational cost.","short_abstract":"Many machine learning problems, including similarity learning, ranking, and clustering, rely on empirical pairwise loss functions whose quadratic computational cost quickly becomes prohibitive at scale. We demonstrate how a frugal approach that retains only a fraction of the available information on pairs can achieve e...","url_abs":"https://arxiv.org/abs/2606.02345","url_pdf":"https://arxiv.org/pdf/2606.02345v1","authors":"[\"Louise Davy\",\"Stephan Clémençon\",\"Charlotte Laclau\"]","published":"2026-06-01T14:54:29Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.LG\"]","methods":"[]","has_code":false}