{"ID":2827789,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.17108","arxiv_id":"2512.17108","title":"Atom: Efficient On-Device Video-Language Pipelines Through Modular Reuse","abstract":"Recent advances in video-language models have enabled powerful applications like video retrieval, captioning, and assembly. However, executing such multi-stage pipelines efficiently on mobile devices remains challenging due to redundant model loads and fragmented execution. We introduce Atom, an on-device system that restructures video-language pipelines for fast and efficient execution. Atom decomposes a billion-parameter model into reusable modules, such as the visual encoder and language decoder, and reuses them across subtasks like captioning, reasoning, and indexing. This reuse-centric design eliminates repeated model loading and enables parallel execution, reducing end-to-end latency without sacrificing performance. On commodity smartphones, Atom achieves 27--33% faster execution compared to non-reuse baselines, with only marginal performance drop ($\\leq$ 2.3 Recall@1 in retrieval, $\\leq$ 1.5 CIDEr in captioning). These results position Atom as a practical, scalable approach for efficient video-language understanding on edge devices.","short_abstract":"Recent advances in video-language models have enabled powerful applications like video retrieval, captioning, and assembly. However, executing such multi-stage pipelines efficiently on mobile devices remains challenging due to redundant model loads and fragmented execution. We introduce Atom, an on-device system that r...","url_abs":"https://arxiv.org/abs/2512.17108","url_pdf":"https://arxiv.org/pdf/2512.17108v1","authors":"[\"Kunjal Panchal\",\"Saayan Mitra\",\"Somdeb Sarkhel\",\"Haoliang Wang\",\"Ishita Dasgupta\",\"Gang Wu\",\"Hui Guan\"]","published":"2025-12-18T22:29:18Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.MM\"]","methods":"[\"Language Model\"]","has_code":false}
