{"ID":2845740,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.03341","arxiv_id":"2511.03341","title":"LaMoS: Enabling Efficient Large Number Modular Multiplication through SRAM-based CiM Acceleration","abstract":"Barrett's algorithm is one of the most widely used methods for performing modular multiplication, a critical nonlinear operation in modern privacy computing techniques such as homomorphic encryption (HE) and zero-knowledge proofs (ZKP). Since modular multiplication dominates the processing time in these applications, computational complexity and memory limitations significantly impact performance. Computing-in-Memory (CiM) is a promising approach to tackle this problem. However, existing schemes currently suffer from two main problems: 1) Most works focus on low bit-width modular multiplication, which is inadequate for mainstream cryptographic algorithms such as elliptic curve cryptography (ECC) and the RSA algorithm, both of which require high bit-width operations; 2) Recent efforts targeting large number modular multiplication rely on inefficient in-memory logic operations, resulting in high scaling costs for larger bit-widths and increased latency. To address these issues, we propose LaMoS, an efficient SRAM-based CiM design for large-number modular multiplication, offering high scalability and area efficiency. First, we analyze the Barrett's modular multiplication method and map the workload onto SRAM CiM macros for high bit-width cases. Additionally, we develop an efficient CiM architecture and dataflow to optimize large-number modular multiplication. Finally, we refine the mapping scheme for better scalability in high bit-width scenarios using workload grouping. Experimental results show that LaMoS achieves a $7.02\\times$ speedup and reduces high bit-width scaling costs compared to existing SRAM-based CiM designs.","short_abstract":"Barrett's algorithm is one of the most widely used methods for performing modular multiplication, a critical nonlinear operation in modern privacy computing techniques such as homomorphic encryption (HE) and zero-knowledge proofs (ZKP). Since modular multiplication dominates the processing time in these applications, c...","url_abs":"https://arxiv.org/abs/2511.03341","url_pdf":"https://arxiv.org/pdf/2511.03341v1","authors":"[\"Haomin Li\",\"Fangxin Liu\",\"Chenyang Guan\",\"Zongwu Wang\",\"Li Jiang\",\"Haibing Guan\"]","published":"2025-11-05T10:20:26Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.AR\"]","methods":"[]","has_code":false}
