{"ID":2854719,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.14823","arxiv_id":"2510.14823","title":"FraQAT: Quantization Aware Training with Fractional bits","abstract":"State-of-the-art (SOTA) generative models have demonstrated impressive capabilities in image synthesis or text generation, often with a large capacity model. However, these large models cannot be deployed on smartphones due to the limited availability of on-board memory and computations. Quantization methods lower the precision of the model parameters, allowing for efficient computations, \\eg, in \\INT{8}. Although aggressive quantization addresses efficiency and memory constraints, preserving the quality of the model remains a challenge. To retain quality in previous aggressive quantization, we propose a new fractional bits quantization (\\short) approach. The novelty is a simple yet effective idea: we progressively reduce the model's precision from 32 to 4 bits per parameter, and exploit the fractional bits during optimization to maintain high generation quality. We show that the \\short{} yields improved quality on a variety of diffusion models, including SD3.5-Medium, Sana, \\pixart, and FLUX.1-schnell, while achieving $4-7\\%$ lower FiD than standard QAT. Finally, we deploy and run Sana on a Samsung S25U, which runs on the Qualcomm SM8750-AB Snapdragon 8 Elite Hexagon Tensor Processor (HTP).","short_abstract":"State-of-the-art (SOTA) generative models have demonstrated impressive capabilities in image synthesis or text generation, often with a large capacity model. However, these large models cannot be deployed on smartphones due to the limited availability of on-board memory and computations. Quantization methods lower the...","url_abs":"https://arxiv.org/abs/2510.14823","url_pdf":"https://arxiv.org/pdf/2510.14823v1","authors":"[\"Luca Morreale\",\"Alberto Gil C. P. Ramos\",\"Malcolm Chadwick\",\"Mehid Noroozi\",\"Ruchika Chavhan\",\"Abhinav Mehrotra\",\"Sourav Bhattacharya\"]","published":"2025-10-16T16:01:08Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}
