Efficient Quantization-Aware Neural Receivers: Beyond Post-Training Quantization
Abstract
As wireless communication systems advance toward Sixth Generation (6G) Radio Access Networks (RAN), Deep Learning (DL)-based neural receivers are emerging as transformative solutions for Physical Layer (PHY) processing, delivering superior Block Error Rate (BLER) performance compared to traditional model-based approaches. Practical deployment on resource-constrained hardware, however, requires efficient quantization to reduce latency, energy, and memory without sacrificing reliability. In this paper, we extend Post-Training Quantization (PTQ) by focusing on Quantization-Aware Training (QAT), which incorporates low-precision simulation during training for robustness at ultra-low bitwidths. In particular, we develop a QAT methodology for a neural receiver architecture and benchmark it against a PTQ approach across diverse 3GPP Clustered Delay Line (CDL) channel profiles under both Line-of-Sight (LoS) and Non-LoS (NLoS) conditions, with user velocities up to 40 m/s. Results show that 4-bit and 8-bit QAT models achieve BLERs comparable to FP32 models at a 10% target BLER. Moreover, QAT models succeed in NLoS scenarios where PTQ models fail to reach the 10% BLER target, while also yielding an 8x compression. These results with respect to full-precision demonstrate that QAT is a key enabler of low-complexity and latency-constrained inference at the PHY layer, facilitating real-time processing in 6G edge devices.