{"ID":2876518,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.00550","arxiv_id":"2509.00550","title":"Integrated Multivariate Segmentation Tree for Heterogeneous Credit Data Analysis in Small- and Medium-Sized Enterprises","abstract":"Traditional decision tree models, which rely exclusively on numerical variables, often face challenges in handling high-dimensional data and are limited in their ability to incorporate textual information effectively. To address these limitations, we propose the integrated multivariate segmentation tree (IMST), a comprehensive framework designed to improve credit evaluation for small- and medium-sized enterprises (SMEs) by integrating financial data with textual sources. This method comprises three core stages: (1) transforming textual data into numerical matrices through matrix factorization, (2) selecting salient financial features using Lasso regression, and (3) constructing a multivariate segmentation tree based on either the Gini index or entropy, with weakest-link pruning applied to control model complexity. Experimental results based on a dataset of 1,428 Chinese SMEs demonstrated that IMST achieved an accuracy rate of 88.9%, surpassing both baseline decision trees (87.4%) and conventional models such as support vector machines and neural networks. Furthermore, the proposed model demonstrated superior interpretability and computational efficiency, featuring a more streamlined architecture and improved risk detection capabilities.","short_abstract":"Traditional decision tree models, which rely exclusively on numerical variables, often face challenges in handling high-dimensional data and are limited in their ability to incorporate textual information effectively. To address these limitations, we propose the integrated multivariate segmentation tree (IMST), a compr...","url_abs":"https://arxiv.org/abs/2509.00550","url_pdf":"https://arxiv.org/pdf/2509.00550v2","authors":"[\"Lu Han\",\"Xiuying Wang\"]","published":"2025-08-30T16:16:45Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CV\"]","methods":"[]","has_code":false}
