{"ID":2857538,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.09311","arxiv_id":"2510.09311","title":"Improved Extended Regular Expression Matching","abstract":"An extended regular expression $R$ specifies a set of strings formed by characters from an alphabet combined with concatenation, union, intersection, complement, and star operators. Given an extended regular expression $R$ and a string $Q$, the extended regular expression matching problem is to decide if $Q$ matches any of the strings specified by $R$. Extended regular expression matching was introduced by Hopcroft and Ullman in the 1970s, who gave a simple dynamic programming solution using $O(n^3m)$ time and $O(n^2m)$ space, where $n$ is the length of $Q$ and $m$ is the length of $R$. The current state-of-the art solution, by Yamamoto and Miyazaki uses $O(\\frac{n^3k + n^2m}{w} + n + m)$ time and $O(\\frac{n^2k + nm}{w} + n + m)$ space, where $k$ is the number of negation and complement operators in $R$ and $w$ is the number of bits in a machine word. This roughly replaces the $m$ factor with $k$ in the dominant terms of both the space and time bounds of the classical Hopcroft and Ullman algorithm. In this paper, we present a new solution that solves extended regular expression matching in \\[ O\\left(n^ωk + \\frac{n^2m}{\\max(w/\\log w, \\log n)} + m\\right) \\] time and $O(\\frac{n^2 \\log k}{w} + n + m) = O(n^2 +m)$ space, where $ω\\approx 2.3716$ is the exponent of matrix multiplication. Essentially, this replaces the dominant $n^3k$ term with $n^ωk$ in the time bound, while simultaneously improving the $n^2k$ term in the space to $O(n^2)$. To achieve our result, we develop several new insights and techniques of independent interest, including a new compact representation to store and efficiently combine substring matches, a new clustering technique for parse trees of extended regular expressions, and a new efficient combination of finite automaton simulation with substring match representation to speed up the classic dynamic programming solution.","short_abstract":"An extended regular expression $R$ specifies a set of strings formed by characters from an alphabet combined with concatenation, union, intersection, complement, and star operators. Given an extended regular expression $R$ and a string $Q$, the extended regular expression matching problem is to decide if $Q$ matches an...","url_abs":"https://arxiv.org/abs/2510.09311","url_pdf":"https://arxiv.org/pdf/2510.09311v2","authors":"[\"Philip Bille\",\"Inge Li Gørtz\",\"Rikke Schjeldrup Jessen\"]","published":"2025-10-10T12:04:53Z","proceeding":"cs.DS","tasks":"[\"cs.DS\"]","methods":"[\"Large Language Model\"]","has_code":false}
