{"ID":2872622,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.08608","arxiv_id":"2509.08608","title":"A 410GFLOP/s, 64 RISC-V Cores, 204.8GBps Shared-Memory Cluster in 12nm FinFET with Systolic Execution Support for Efficient B5G/6G AI-Enhanced O-RAN","abstract":"We present HeartStream, a 64-RV-core shared-L1-memory cluster (410 GFLOP/s peak performance and 204.8 GBps L1 bandwidth) for energy-efficient AI-enhanced O-RAN. The cores and cluster architecture are customized for baseband processing, supporting complex (16-bit real\u0026imaginary) instructions: multiply\u0026accumulate, division\u0026square-root, SIMD instructions, and hardware-managed systolic queues, improving up to 1.89x the energy efficiency of key baseband kernels. At 800MHz@0.8V, HeartStream delivers up to 243GFLOP/s on complex-valued wireless workloads. Furthermore, the cores also support efficient AI processing on received data at up to 72 GOP/s. HeartStream is fully compatible with base station power and processing latency limits: it achieves leading-edge software-defined PUSCH efficiency (49.6GFLOP/s/W) and consumes just 0.68W (645MHz@0.65V), within the 4 ms end-to-end constraint for B5G/6G uplink.","short_abstract":"We present HeartStream, a 64-RV-core shared-L1-memory cluster (410 GFLOP/s peak performance and 204.8 GBps L1 bandwidth) for energy-efficient AI-enhanced O-RAN. The cores and cluster architecture are customized for baseband processing, supporting complex (16-bit real\u0026imaginary) instructions: multiply\u0026accumulate, divisi...","url_abs":"https://arxiv.org/abs/2509.08608","url_pdf":"https://arxiv.org/pdf/2509.08608v1","authors":"[\"Yichao Zhang\",\"Marco Bertuletti\",\"Sergio Mazzola\",\"Samuel Riedel\",\"Luca Benini\"]","published":"2025-09-10T14:05:43Z","proceeding":"cs.DC","tasks":"[\"cs.DC\"]","methods":"[]","has_code":false}