{"ID":2853873,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.15286","arxiv_id":"2510.15286","title":"MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation","abstract":"Industrial recommender systems critically depend on high-quality ranking models. However, traditional pipelines still rely on manual feature engineering and scenario-specific architectures, which hinder cross-scenario transfer and large-scale deployment. To address these challenges, we propose \\textbf{MTmixAtt}, a unified Mixture-of-Experts (MoE) architecture with Multi-Mix Attention, designed for large-scale recommendation tasks. MTmixAtt integrates two key components. The \\textbf{AutoToken} module automatically clusters heterogeneous features into semantically coherent tokens, removing the need for human-defined feature groups. The \\textbf{MTmixAttBlock} module enables efficient token interaction via a learnable mixing matrix, shared dense experts, and scenario-aware sparse experts, capturing both global patterns and scenario-specific behaviors within a single framework. Extensive experiments on the industrial TRec dataset from Meituan demonstrate that MTmixAtt consistently outperforms state-of-the-art baselines including Transformer-based models, WuKong, HiFormer, MLP-Mixer, and RankMixer. At comparable parameter scales, MTmixAtt achieves superior CTR and CTCVR metrics; scaling to MTmixAtt-1B yields further monotonic gains. Large-scale online A/B tests validate the real-world impact: in the \\textit{Homepage} scenario, MTmixAtt increases Payment PV by \\textbf{+3.62\\%} and Actual Payment GTV by \\textbf{+2.54\\%}. Overall, MTmixAtt provides a unified and scalable solution for modeling arbitrary heterogeneous features across scenarios, significantly improving both user experience and commercial outcomes.","short_abstract":"Industrial recommender systems critically depend on high-quality ranking models. However, traditional pipelines still rely on manual feature engineering and scenario-specific architectures, which hinder cross-scenario transfer and large-scale deployment. To address these challenges, we propose \\textbf{MTmixAtt}, a unif...","url_abs":"https://arxiv.org/abs/2510.15286","url_pdf":"https://arxiv.org/pdf/2510.15286v1","authors":"[\"Xianyang Qi\",\"Yuan Tian\",\"Zhaoyu Hu\",\"Zhirui Kuai\",\"Chang Liu\",\"Hongxiang Lin\",\"Lei Wang\"]","published":"2025-10-17T03:50:09Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false}
