{"ID":2875011,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.03265","arxiv_id":"2509.03265","title":"Compressed Dictionary Matching on Run-Length Encoded Strings","abstract":"Given a set of pattern strings $\\mathcal{P}=\\{P_1, P_2,\\ldots P_k\\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed setting, where the pattern strings and the text string are compressed using run-length encoding, and the goal is to solve the problem without decompression and achieve efficient time and space in the size of the compressed strings. Let $m$ and $n$ be the total length of the patterns $\\mathcal{P}$ and the length of the text string $S$, respectively, and let $\\overline{m}$ and $\\overline{n}$ be the total number of runs in the run-length encoding of the patterns in $\\mathcal{P}$ and $S$, respectively. Our main result is an algorithm that achieves $O( (\\overline{m} + \\overline{n})\\log \\log m + \\mathrm{occ})$ expected time, and $O(\\overline{m})$ space, where $\\mathrm{occ}$ is the total number of occurrences of patterns in $S$. This is the first non-trivial solution to the problem. Since any solution must read the input, our time bound is optimal within an $\\log \\log m$ factor. We introduce several new techniques to achieve our bounds, including a new compressed representation of the classic Aho-Corasick automaton and a new efficient string index that supports fast queries in run-length encoded strings.","short_abstract":"Given a set of pattern strings $\\mathcal{P}=\\{P_1, P_2,\\ldots P_k\\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed setting, where the pattern strings and the text string are compressed using run-lengt...","url_abs":"https://arxiv.org/abs/2509.03265","url_pdf":"https://arxiv.org/pdf/2509.03265v1","authors":"[\"Philip Bille\",\"Inge Li Gørtz\",\"Simon J. Puglisi\",\"Simon R. Tarnow\"]","published":"2025-09-03T12:30:08Z","proceeding":"cs.DS","tasks":"[\"cs.DS\"]","methods":"[]","has_code":false}
