{"ID":2921701,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-03T05:56:00.181519634Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01190","arxiv_id":"2606.01190","title":"The anti-lexicographic SUS-anchor: a near-optimal k=1 sampling scheme","abstract":"In recent years, there has been a renewed interest in the search for low density minimizer schemes. These schemes take a window of $w$ consecutive $k$-mers, and sample one of them: the smallest under some specific order. Schemes such as the mod-minimizer provide a low density (fraction of sampled $k$-mers) when $k \\gg w$, while schemes such as the greedy minimizer work well for explicit small parameters roughly in the regime $k \\leq 2w$, for $k$ and $w$ up to $15$ or so. When $k \u003c \\log_σw$ is very small, minimizer schemes cannot do well, and more general sampling schemes are needed that can be richer than just comparing $k$-mers. Bidirectional-string anchors (bd-anchors) form one such scheme. Inspired by bd-anchors, we introduce the smallest unique substring or SUS-anchor: Given a window, this considers all suffixes that do not occur as a substring elsewhere in the window. It then samples the start position of the smallest suffix according to the new anti-lexicographic order that minimizes the first character and maximizes the remaining characters. We give a linear-time and $O(w)$ space streaming algorithm to compute all SUS-anchors of a string. For alphabet size $σ=4$ and $k=1$, the anti-lexicographic SUS-anchor empirically has density $\u003c1\\%$ away from the density lower bound, significantly improving over bd-anchors that are often $\u003e15\\%$ above it. For alphabet size $σ=2$, the density is at most $10\\%$ above the lower bound, which again improves over the $\u003e50\\%$ overhead of bd-anchors.","short_abstract":"In recent years, there has been a renewed interest in the search for low density minimizer schemes. These schemes take a window of $w$ consecutive $k$-mers, and sample one of them: the smallest under some specific order. Schemes such as the mod-minimizer provide a low density (fraction of sampled $k$-mers) when $k \\gg...","url_abs":"https://arxiv.org/abs/2606.01190","url_pdf":"https://arxiv.org/pdf/2606.01190v1","authors":"[\"Groot Koerkamp\",\"Ragnar\"]","published":"2026-05-31T12:11:49Z","proceeding":"cs.DS","tasks":"[\"cs.DS\"]","methods":"[]","has_code":false}
