{"ID":2841235,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.12081","arxiv_id":"2511.12081","title":"From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction","abstract":"Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns - a stark contrast to the smooth, predictable gains seen in large language models. We identify the root cause as a structural misalignment: Transformers assume sequential compositionality, while CTR data demand combinatorial reasoning over high-cardinality semantic fields. Unstructured attention spreads capacity indiscriminately, amplifying noise under extreme sparsity and breaking scalable learning. To restore alignment, we introduce the Field-Aware Transformer (FAT), which embeds field-based interaction priors into attention through decomposed content alignment and cross-field modulation. This design ensures model complexity scales with the number of fields F, not the total vocabulary size n \u003e\u003e F, leading to tighter generalization and, critically, observed power-law scaling in AUC as model width increases. We present the first formal scaling law for CTR models, grounded in Rademacher complexity, that explains and predicts this behavior. On large-scale benchmarks, FAT improves AUC by up to +0.51% over state-of-the-art methods. Deployed online, it delivers +2.33% CTR and +0.66% RPM. Our work establishes that effective scaling in recommendation arises not from size, but from structured expressivity-architectural coherence with data semantics.","short_abstract":"Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns - a stark contrast to the smooth, predictable gains seen in large language models. We identify the root cause as a structural misalignment: Transformers assume sequential compositionality,...","url_abs":"https://arxiv.org/abs/2511.12081","url_pdf":"https://arxiv.org/pdf/2511.12081v1","authors":"[\"Bencheng Yan\",\"Yuejie Lei\",\"Zhiyuan Zeng\",\"Di Wang\",\"Kaiyi Lin\",\"Pengjie Wang\",\"Jian Xu\",\"Bo Zheng\"]","published":"2025-11-15T07:55:50Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.LG\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false}
