{"ID":2835899,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.00105","arxiv_id":"2512.00105","title":"Efficiently Sampling Interval Patterns from Numerical Databases","abstract":"Pattern sampling has emerged as a promising approach for information discovery in large databases, allowing analysts to focus on a manageable subset of patterns. In this approach, patterns are randomly drawn based on an interestingness measure, such as frequency or hyper-volume. This paper presents the first sampling approach designed to handle interval patterns in numerical databases. This approach, named Fips, samples interval patterns proportionally to their frequency. It uses a multi-step sampling procedure and addresses a key challenge in numerical data: accurately determining the number of interval patterns that cover each object. We extend this work with HFips, which samples interval patterns proportionally to both their frequency and hyper-volume. These methods efficiently tackle the well-known long-tail phenomenon in pattern sampling. We formally prove that Fips and HFips sample interval patterns in proportion to their frequency and the product of hyper-volume and frequency, respectively. Through experiments on several databases, we demonstrate the quality of the obtained patterns and their robustness against the long-tail phenomenon.","short_abstract":"Pattern sampling has emerged as a promising approach for information discovery in large databases, allowing analysts to focus on a manageable subset of patterns. In this approach, patterns are randomly drawn based on an interestingness measure, such as frequency or hyper-volume. This paper presents the first sampling a...","url_abs":"https://arxiv.org/abs/2512.00105","url_pdf":"https://arxiv.org/pdf/2512.00105v1","authors":"[\"Djawad Bekkoucha\",\"Lamine Diop\",\"Abdelkader Ouali\",\"Bruno Crémilleux\",\"Patrice Boizumault\"]","published":"2025-11-27T10:35:17Z","proceeding":"cs.DB","tasks":"[\"cs.DB\",\"cs.AI\"]","methods":"[]","has_code":false}
