{"ID":2880712,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.13843","arxiv_id":"2508.13843","title":"UniECS: Unified Multimodal E-Commerce Search Framework with Gated Cross-modal Fusion","abstract":"Current e-commerce multimodal retrieval systems face two key limitations: they optimize for specific tasks with fixed modality pairings, and lack comprehensive benchmarks for evaluating unified retrieval approaches. To address these challenges, we introduce UniECS, a unified multimodal e-commerce search framework that handles all retrieval scenarios across image, text, and their combinations. Our work makes three key contributions. First, we propose a flexible architecture with a novel gated multimodal encoder that uses adaptive fusion mechanisms. This encoder integrates different modality representations while handling missing modalities. Second, we develop a comprehensive training strategy to optimize learning. It combines cross-modal alignment loss (CMAL), cohesive local alignment loss (CLAL), intra-modal contrastive loss (IMCL), and adaptive loss weighting. Third, we create M-BEER, a carefully curated multimodal benchmark containing 50K product pairs for e-commerce search evaluation. Extensive experiments demonstrate that UniECS consistently outperforms existing methods across four e-commerce benchmarks with fine-tuning or zero-shot evaluation. On our M-BEER bench, UniECS achieves substantial improvements in cross-modal tasks (up to 28\\% gain in R@10 for text-to-image retrieval) while maintaining parameter efficiency (0.2B parameters) compared to larger models like GME-Qwen2VL (2B) and MM-Embed (8B). Furthermore, we deploy UniECS in the e-commerce search platform of Kuaishou Inc. across two search scenarios, achieving notable improvements in Click-Through Rate (+2.74\\%) and Revenue (+8.33\\%). The comprehensive evaluation demonstrates the effectiveness of our approach in both experimental and real-world settings. Corresponding codes, models and datasets will be made publicly available at https://github.com/qzp2018/UniECS.","short_abstract":"Current e-commerce multimodal retrieval systems face two key limitations: they optimize for specific tasks with fixed modality pairings, and lack comprehensive benchmarks for evaluating unified retrieval approaches. To address these challenges, we introduce UniECS, a unified multimodal e-commerce search framework that...","url_abs":"https://arxiv.org/abs/2508.13843","url_pdf":"https://arxiv.org/pdf/2508.13843v1","authors":"[\"Zihan Liang\",\"Yufei Ma\",\"ZhiPeng Qian\",\"Huangyu Dai\",\"Zihan Wang\",\"Ben Chen\",\"Chenyi Lei\",\"Yuqing Ding\",\"Han Li\"]","published":"2025-08-19T14:06:13Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.AI\"]","methods":"[]","has_code":false,"code_links":[{"ID":610703,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880712,"paper_url":"https://arxiv.org/abs/2508.13843","paper_title":"UniECS: Unified Multimodal E-Commerce Search Framework with Gated Cross-modal Fusion","repo_url":"https://github.com/qzp2018/UniECS","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}