{"ID":2898899,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.03110","arxiv_id":"2507.03110","title":"SymMatika: Structure-Aware Symbolic Discovery","abstract":"Symbolic regression (SR) seeks to recover closed-form mathematical expressions that describe observed data. While existing methods have advanced the discovery of either explicit mappings (i.e., $y = f(\\mathbf{x})$) or discovering implicit relations (i.e., $F(\\mathbf{x}, y)=0$), few modern and accessible frameworks support both. Moreover, most approaches treat each expression candidate in isolation, without reusing recurring structural patterns that could accelerate search. We introduce SymMatika, a hybrid SR algorithm that combines multi-island genetic programming (GP) with a reusable motif library inspired by biological sequence analysis. SymMatika identifies high-impact substructures in top-performing candidates and reintroduces them to guide future generations. Additionally, it incorporates a feedback-driven evolutionary engine and supports both explicit and implicit relation discovery using implicit-derivative metrics. Across benchmarks, SymMatika achieves state-of-the-art recovery rates on the Nguyen and Feynman benchmark suites, an impressive recovery rate of 61\\% on Nguyen-12 compared to the next best 2\\%, and strong placement on the error-complexity Pareto fronts on the Feynman equations and on a subset of 57 SRBench Black-box problems. Our results demonstrate the power of structure-aware evolutionary search for scientific discovery. To support broader research in interpretable modeling and symbolic discovery, we have open-sourced the full SymMatika framework.","short_abstract":"Symbolic regression (SR) seeks to recover closed-form mathematical expressions that describe observed data. While existing methods have advanced the discovery of either explicit mappings (i.e., $y = f(\\mathbf{x})$) or discovering implicit relations (i.e., $F(\\mathbf{x}, y)=0$), few modern and accessible frameworks supp...","url_abs":"https://arxiv.org/abs/2507.03110","url_pdf":"https://arxiv.org/pdf/2507.03110v2","authors":"[\"Michael Scherk\",\"Boyuan Chen\"]","published":"2025-07-03T18:32:03Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}
