{"ID":2851582,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.19338","arxiv_id":"2510.19338","title":"Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning","abstract":"In this technical report, we present the Ring-linear model series, specifically including Ring-mini-linear-2.0 and Ring-flash-linear-2.0. Ring-mini-linear-2.0 comprises 16B parameters and 957M activations, while Ring-flash-linear-2.0 contains 104B parameters and 6.1B activations. Both models adopt a hybrid architecture that effectively integrates linear attention and softmax attention, significantly reducing I/O and computational overhead in long-context inference scenarios. Compared to a 32 billion parameter dense model, this series reduces inference cost to 1/10, and compared to the original Ring series, the cost is also reduced by over 50%. Furthermore, through systematic exploration of the ratio between different attention mechanisms in the hybrid architecture, we have identified the currently optimal model structure. Additionally, by leveraging our self-developed high-performance FP8 operator library-linghe, overall training efficiency has been improved by 50%. Benefiting from the high alignment between the training and inference engine operators, the models can undergo long-term, stable, and highly efficient optimization during the reinforcement learning phase, consistently maintaining SOTA performance across multiple challenging complex reasoning benchmarks.","short_abstract":"In this technical report, we present the Ring-linear model series, specifically including Ring-mini-linear-2.0 and Ring-flash-linear-2.0. Ring-mini-linear-2.0 comprises 16B parameters and 957M activations, while Ring-flash-linear-2.0 contains 104B parameters and 6.1B activations. Both models adopt a hybrid architecture...","url_abs":"https://arxiv.org/abs/2510.19338","url_pdf":"https://arxiv.org/pdf/2510.19338v2","authors":"[\"Ling Team\",\"Bin Han\",\"Caizhi Tang\",\"Chen Liang\",\"Donghao Zhang\",\"Fan Yuan\",\"Feng Zhu\",\"Jie Gao\",\"Jingyu Hu\",\"Longfei Li\",\"Meng Li\",\"Mingyang Zhang\",\"Peijie Jiang\",\"Peng Jiao\",\"Qian Zhao\",\"Qingyuan Yang\",\"Wenbo Shen\",\"Xinxing Yang\",\"Yalin Zhang\",\"Yankun Ren\",\"Yao Zhao\",\"Yibo Cao\",\"Yixuan Sun\",\"Yue Zhang\",\"Yuchen Fang\",\"Zibin Lin\",\"Zixuan Cheng\",\"Jun Zhou\"]","published":"2025-10-22T07:59:38Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}