{"ID":2894185,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.10897","arxiv_id":"2507.10897","title":"LLMATCH: A Unified Schema Matching Framework with Large Language Models","abstract":"Schema matching is a foundational task in enterprise data integration, aiming to align disparate data sources. While traditional methods handle simple one-to-one table mappings, they often struggle with complex multi-table schema matching in real-world applications. We present LLMatch, a unified and modular schema matching framework. LLMatch decomposes schema matching into three distinct stages: schema preparation, table-candidate selection, and column-level alignment, enabling component-level evaluation and future-proof compatibility. It includes a novel two-stage optimization strategy: a Rollup module that consolidates semantically related columns into higher-order concepts, followed by a Drilldown module that re-expands these concepts for fine-grained column mapping. To address the scarcity of complex semantic matching benchmarks, we introduce SchemaNet, a benchmark derived from real-world schema pairs across three enterprise domains, designed to capture the challenges of multi-table schema alignment in practical settings. Experiments demonstrate that LLMatch significantly improves matching accuracy in complex schema matching settings and substantially boosts engineer productivity in real-world data integration.","short_abstract":"Schema matching is a foundational task in enterprise data integration, aiming to align disparate data sources. While traditional methods handle simple one-to-one table mappings, they often struggle with complex multi-table schema matching in real-world applications. We present LLMatch, a unified and modular schema matc...","url_abs":"https://arxiv.org/abs/2507.10897","url_pdf":"https://arxiv.org/pdf/2507.10897v1","authors":"[\"Sha Wang\",\"Yuchen Li\",\"Hanhua Xiao\",\"Bing Tian Dai\",\"Roy Ka-Wei Lee\",\"Yanfei Dong\",\"Lambert Deng\"]","published":"2025-07-15T01:24:49Z","proceeding":"cs.DB","tasks":"[\"cs.DB\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
