{"ID":2841582,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.11057","arxiv_id":"2511.11057","title":"R-enum Revisited: Speedup and Extension for Context-Sensitive Repeats and Net Frequencies","abstract":"Nishimoto and Tabei [CPM, 2021] proposed r-enum, an algorithm to enumerate various characteristic substrings, including maximal repeats, in a string $T$ of length $n$ in $O(r)$ words of compressed working space, where $r \\le n$ is the number of runs in the Burrows-Wheeler transform (BWT) of $T$. Given the run-length encoded BWT (RLBWT) of $T$, r-enum runs in $O(n \\log \\log_{w} (n/r))$ time in addition to the time linear to the number of output strings, where $w = Θ(\\log n)$ is the word size. In this paper, we first improve the $O(n \\log \\log_{w} (n/r))$ term to $O(n)$. We next extend r-enum to compute other context-sensitive repeats such as near-supermaximal repeats (NSMRs) and supermaximal repeats, as well as the context diversity for every maximal repeat in the same complexities. Furthermore, we study net occurrences: An occurrence of a repeat is called a net occurrence if it is not covered by another repeat, and the net frequency of a repeat is the number of its net occurrences. With this terminology, an NSMR is a repeat with a positive net frequency. Given the RLBWT of $T$, we show how to compute the set $S^{nsmr}$ of all NSMRs in $T$ together with their net frequency/occurrences in $O(n)$ time and $O(r)$ space. We also show that an $O(r)$-space data structure can be built from the RLBWT to compute the net frequency/occurrences of any pattern in optimal time. The data structure is built in $O(r)$ space and in $O(n)$ time with high probability or deterministic $O(n + |S^{nsmr}| \\log \\log \\min(σ, |S^{nsmr}|))$ time, where $σ\\le r$ is the alphabet size of $T$. To achieve this, we prove that the total number of net occurrences is less than $2r$. With the duality between net occurrences and \\emph{minimal unique substrings (MUSs)}, we get a new upper bound $2r$ of the number of MUSs in $T$, which may be of independent interest.","short_abstract":"Nishimoto and Tabei [CPM, 2021] proposed r-enum, an algorithm to enumerate various characteristic substrings, including maximal repeats, in a string $T$ of length $n$ in $O(r)$ words of compressed working space, where $r \\le n$ is the number of runs in the Burrows-Wheeler transform (BWT) of $T$. Given the run-length en...","url_abs":"https://arxiv.org/abs/2511.11057","url_pdf":"https://arxiv.org/pdf/2511.11057v2","authors":"[\"Kotaro Kimura\",\"Tomohiro I\"]","published":"2025-11-14T08:15:25Z","proceeding":"cs.DS","tasks":"[\"cs.DS\"]","methods":"[]","has_code":false}
