{"ID":2895223,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.09616","arxiv_id":"2507.09616","title":"MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression","abstract":"Deploying transformer-based neural networks on resource-constrained edge devices presents a significant challenge. This challenge is often addressed through various techniques, such as low-rank approximation and mixed-precision quantization. In this work, we introduce Mixed Low-Rank and Quantization (MLoRQ), a novel method that integrates both techniques. MLoRQ employs a two-stage optimization process to determine optimal bit-width and rank assignments for each layer, adhering to predefined memory constraints. This process includes: (i) an intra-layer optimization that identifies potentially optimal compression solutions out of all low-rank and quantization combinations; (ii) an inter-layer optimization that assigns bit-width precision and rank to each layer while ensuring the memory constraint is met. An optional final step applies a sequential optimization process using a modified adaptive rounding technique to mitigate compression-induced errors in joint low-rank approximation and quantization. The method is compatible and can be seamlessly integrated with most existing quantization algorithms. MLoRQ shows state-of-the-art results with up to 15\\% performance improvement, evaluated on Vision Transformers for image classification, object detection, and instance segmentation tasks.","short_abstract":"Deploying transformer-based neural networks on resource-constrained edge devices presents a significant challenge. This challenge is often addressed through various techniques, such as low-rank approximation and mixed-precision quantization. In this work, we introduce Mixed Low-Rank and Quantization (MLoRQ), a novel me...","url_abs":"https://arxiv.org/abs/2507.09616","url_pdf":"https://arxiv.org/pdf/2507.09616v1","authors":"[\"Ofir Gordon\",\"Ariel Lapid\",\"Elad Cohen\",\"Yarden Yagil\",\"Arnon Netzer\",\"Hai Victor Habi\"]","published":"2025-07-13T12:48:46Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CV\"]","methods":"[\"Vision Transformer\",\"Transformer\"]","has_code":false}
