{"ID":2882630,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.09500","arxiv_id":"2508.09500","title":"MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI","abstract":"Quantized Neural Networks (QNN) with extremely low-bitwidth data have proven promising in efficient storage and computation on edge devices. To further reduce the accuracy drop while increasing speedup, layer-wise mixed-precision quantization (MPQ) becomes a popular solution. However, existing algorithms for exploring MPQ schemes are limited in flexibility and efficiency. Comprehending the complex impacts of different MPQ schemes on post-training quantization and quantization-aware training results is a challenge for conventional methods. Furthermore, an end-to-end framework for the optimization and deployment of MPQ models is missing in existing work. In this paper, we propose the MiCo framework, a holistic MPQ exploration and deployment framework for edge AI applications. The framework adopts a novel optimization algorithm to search for optimal quantization schemes with the highest accuracies while meeting latency constraints. Hardware-aware latency models are built for different hardware targets to enable fast explorations. After the exploration, the framework enables direct deployment from PyTorch MPQ models to bare-metal C codes, leading to end-to-end speedup with minimal accuracy drops.","short_abstract":"Quantized Neural Networks (QNN) with extremely low-bitwidth data have proven promising in efficient storage and computation on edge devices. To further reduce the accuracy drop while increasing speedup, layer-wise mixed-precision quantization (MPQ) becomes a popular solution. However, existing algorithms for exploring...","url_abs":"https://arxiv.org/abs/2508.09500","url_pdf":"https://arxiv.org/pdf/2508.09500v1","authors":"[\"Zijun Jiang\",\"Yangdi Lyu\"]","published":"2025-08-13T05:18:21Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AR\"]","methods":"[\"LoRA\"]","has_code":false}
