{"ID":2899227,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.02020","arxiv_id":"2507.02020","title":"Template-Based Schema Matching of Multi-Layout Tenancy Schedules:A Comparative Study of a Template-Based Hybrid Matcher and the ALITE Full Disjunction Model","abstract":"The lack of standardized tabular formats for tenancy schedules across real estate firms creates significant inefficiencies in data integration. Existing automated integration methods, such as Full Disjunction (FD)-based models like ALITE, prioritize completeness but result in schema bloat, sparse attributes and limited business usability. We propose a novel hybrid, template-based schema matcher that aligns multi-layout tenancy schedules to a predefined target schema. The matcher combines schema (Jaccard, Levenshtein) and instance-based metrics (data types, distributions) with globally optimal assignments determined via the Hungarian Algorithm. Evaluation against a manually labeled ground truth demonstrates substantial improvements, with grid search optimization yielding a peak F1-score of 0.881 and an overall null percentage of 45.7%. On a separate ground truth of 20 semantically similar column sets, ALITE achieves an F1-score of 0.712 and 75.6% nulls. These results suggest that combining structured business knowledge with hybrid matching can yield more usable and business-aligned schema mappings. The approach assumes cleanly extracted tabular input, future work could explore extending the matcher to support complex, composite tables.","short_abstract":"The lack of standardized tabular formats for tenancy schedules across real estate firms creates significant inefficiencies in data integration. Existing automated integration methods, such as Full Disjunction (FD)-based models like ALITE, prioritize completeness but result in schema bloat, sparse attributes and limited...","url_abs":"https://arxiv.org/abs/2507.02020","url_pdf":"https://arxiv.org/pdf/2507.02020v1","authors":"[\"Tim Uilkema\",\"Yao Ma\",\"Seyed Sahand Mohammadi Ziabari\",\"Joep van Vliet\"]","published":"2025-07-02T14:37:31Z","proceeding":"cs.DB","tasks":"[\"cs.DB\"]","methods":"[]","has_code":false}
