{"ID":2866252,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.05109","arxiv_id":"2510.05109","title":"Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices","abstract":"Large Multimodal Models (LMMs) are inherently modular, comprising vision and audio encoders, a projector, and a language backbone. Yet existing systems execute them monolithically, underutilizing the heterogeneous accelerators (NPUs, GPUs, DSPs) on modern SoCs and inflating end-to-end latency. We present Nanomind, a hardware-software co-design inference framework that decomposes each LMM into modular \"bricks\"--vision, projector, language, and audio--and maps each brick to its best-suited compute units. A Token-Aware Buffer Manager (TABM) enables zero-copy embedding transfer across accelerators on unified-memory SoCs, bypassing CPU bottlenecks. Combined with customized hardware, a battery-aware scheduler, and fused low-bit GEMM kernels, Nanomind runs entirely on a compact, battery-powered prototype that operates fully offline. Nanomind reduces end-to-end energy by 42.3% against mainstream edge frameworks and devkits; in its on-demand low-power mode, the prototype runs LLaVA-OneVision-Qwen2-0.5B with a camera for nearly 18.8 hours on a single 2,000 mAh battery.","short_abstract":"Large Multimodal Models (LMMs) are inherently modular, comprising vision and audio encoders, a projector, and a language backbone. Yet existing systems execute them monolithically, underutilizing the heterogeneous accelerators (NPUs, GPUs, DSPs) on modern SoCs and inflating end-to-end latency. We present Nanomind, a ha...","url_abs":"https://arxiv.org/abs/2510.05109","url_pdf":"https://arxiv.org/pdf/2510.05109v6","authors":"[\"Yilong Li\",\"Shuai Zhang\",\"Yijing Zeng\",\"Hao Zhang\",\"Xinmiao Xiong\",\"Jingyu Liu\",\"Pan Hu\",\"Suman Banerjee\"]","published":"2025-09-25T22:28:44Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.AI\",\"cs.CL\",\"eess.SP\"]","methods":"[]","has_code":false}
