{"ID":2839828,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.14111","arxiv_id":"2511.14111","title":"CascadedViT: Cascaded Chunk-FeedForward and Cascaded Group Attention Vision Transformer","abstract":"Vision Transformers (ViTs) have demonstrated remarkable performance across a range of computer vision tasks; however, their high computational, memory, and energy demands hinder deployment on resource-constrained platforms. In this paper, we propose \\emph{Cascaded-ViT (CViT)}, a lightweight and compute-efficient vision transformer architecture featuring a novel feedforward network design called \\emph{Cascaded-Chunk Feed Forward Network (CCFFN)}. By splitting input features, CCFFN improves parameter and FLOP efficiency without sacrificing accuracy. Experiments on ImageNet-1K show that our \\emph{CViT-XL} model achieves 75.5\\% Top-1 accuracy while reducing FLOPs by 15\\% and energy consumption by 3.3\\% compared to EfficientViT-M5. Across various model sizes, the CViT family consistently exhibits the lowest energy consumption, making it suitable for deployment on battery-constrained devices such as mobile phones and drones. Furthermore, when evaluated using a new metric called \\emph{Accuracy-Per-FLOP (APF)}, which quantifies compute efficiency relative to accuracy, CViT models consistently achieve top-ranking efficiency. Particularly, CViT-L is 2.2\\% more accurate than EfficientViT-M2 while having comparable APF scores.","short_abstract":"Vision Transformers (ViTs) have demonstrated remarkable performance across a range of computer vision tasks; however, their high computational, memory, and energy demands hinder deployment on resource-constrained platforms. In this paper, we propose \\emph{Cascaded-ViT (CViT)}, a lightweight and compute-efficient vision...","url_abs":"https://arxiv.org/abs/2511.14111","url_pdf":"https://arxiv.org/pdf/2511.14111v2","authors":"[\"Srivathsan Sivakumar\",\"Faisal Z. Qureshi\"]","published":"2025-11-18T03:51:15Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Vision Transformer\",\"Transformer\"]","has_code":false}
