{"ID":2837266,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.18691","arxiv_id":"2511.18691","title":"EVCC: Enhanced Vision Transformer-ConvNeXt-CoAtNet Fusion for Classification","abstract":"Hybrid vision architectures combining Transformers and CNNs have significantly advanced image classification, but they usually do so at significant computational cost. We introduce EVCC (Enhanced Vision Transformer-ConvNeXt-CoAtNet), a novel multi-branch architecture integrating the Vision Transformer, lightweight ConvNeXt, and CoAtNet through key innovations: (1) adaptive token pruning with information preservation, (2) gated bidirectional cross-attention for enhanced feature refinement, (3) auxiliary classification heads for multi-task learning, and (4) a dynamic router gate employing context-aware confidence-driven weighting. Experiments across the CIFAR-100, Tobacco3482, CelebA, and Brain Cancer datasets demonstrate EVCC's superiority over powerful models like DeiT-Base, MaxViT-Base, and CrossViT-Base by consistently achieving state-of-the-art accuracy with improvements of up to 2 percentage points, while reducing FLOPs by 25 to 35%. Our adaptive architecture adjusts computational demands to deployment needs by dynamically reducing token count, efficiently balancing the accuracy-efficiency trade-off while combining global context, local details, and hierarchical features for real-world applications. The source code of our implementation is available at https://anonymous.4open.science/r/EVCC.","short_abstract":"Hybrid vision architectures combining Transformers and CNNs have significantly advanced image classification, but they usually do so at significant computational cost. We introduce EVCC (Enhanced Vision Transformer-ConvNeXt-CoAtNet), a novel multi-branch architecture integrating the Vision Transformer, lightweight Conv...","url_abs":"https://arxiv.org/abs/2511.18691","url_pdf":"https://arxiv.org/pdf/2511.18691v1","authors":"[\"Kazi Reyazul Hasan\",\"Md Nafiu Rahman\",\"Wasif Jalal\",\"Sadif Ahmed\",\"Shahriar Raj\",\"Mubasshira Musarrat\",\"Muhammad Abdullah Adnan\"]","published":"2025-11-24T02:11:19Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Vision Transformer\",\"Transformer\",\"Convolutional Neural Network\"]","project_urls":"[\"https://anonymous.4open.science/r/EVCC\"]","has_code":false}
