{"ID":2877717,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.20013","arxiv_id":"2508.20013","title":"Cross-Platform E-Commerce Product Categorization and Recategorization: A Multimodal Hierarchical Classification Approach","abstract":"This study addresses critical industrial challenges in e-commerce product categorization, namely platform heterogeneity and the structural limitations of existing taxonomies, by developing and deploying a multimodal hierarchical classification framework. Using a dataset of 271,700 products from 40 international fashion e-commerce platforms, we integrate textual features (RoBERTa), visual features (ViT), and joint vision-language representations (CLIP). We investigate fusion strategies, including early, late, and attention-based fusion within a hierarchical architecture enhanced by dynamic masking to ensure taxonomic consistency. Results show that CLIP embeddings combined via an MLP-based late-fusion strategy achieve the highest hierarchical F1 (98.59%), outperforming unimodal baselines. To address shallow or inconsistent categories, we further introduce a self-supervised \"product recategorization\" pipeline using SimCLR, UMAP, and cascade clustering, which discovered new, fine-grained categories (for example, subtypes of \"Shoes\") with cluster purities above 86%. Cross-platform experiments reveal a deployment-relevant trade-off: complex late-fusion methods maximize accuracy with diverse training data, while simpler early-fusion methods generalize more effectively to unseen platforms. Finally, we demonstrate the framework's industrial scalability through deployment in EURWEB's commercial transaction intelligence platform via a two-stage inference pipeline, combining a lightweight RoBERTa stage with a GPU-accelerated multimodal stage to balance cost and accuracy.","short_abstract":"This study addresses critical industrial challenges in e-commerce product categorization, namely platform heterogeneity and the structural limitations of existing taxonomies, by developing and deploying a multimodal hierarchical classification framework. Using a dataset of 271,700 products from 40 international fashion...","url_abs":"https://arxiv.org/abs/2508.20013","url_pdf":"https://arxiv.org/pdf/2508.20013v2","authors":"[\"Lotte Gross\",\"Rebecca Walter\",\"Nicole Zoppi\",\"Adrien Justus\",\"Alessandro Gambetti\",\"Qiwei Han\",\"Maximilian Kaiser\"]","published":"2025-08-27T16:16:12Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.IR\"]","methods":"[]","has_code":false}
