{"ID":2825135,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.21512","arxiv_id":"2512.21512","title":"Fixed-Threshold Evaluation of a Hybrid CNN-ViT for AI-Generated Image Detection Across Photos and Art","abstract":"AI image generators create both photorealistic images and stylized art, necessitating robust detectors that maintain performance under common post-processing transformations (JPEG compression, blur, downscaling). Existing methods optimize single metrics without addressing deployment-critical factors such as operating point selection and fixed-threshold robustness. This work addresses misleading robustness estimates by introducing a fixed-threshold evaluation protocol that holds decision thresholds, selected once on clean validation data, fixed across all post-processing transformations. Traditional methods retune thresholds per condition, artificially inflating robustness estimates and masking deployment failures. We report deployment-relevant performance at three operating points (Low-FPR, ROC-optimal, Best-F1) under systematic degradation testing using a lightweight CNN-ViT hybrid with gated fusion and optional frequency enhancement. Our evaluation exposes a statistically validated forensic-semantic spectrum: frequency-aided CNNs excel on pristine photos but collapse under compression (93.33% to 61.49%), whereas ViTs degrade minimally (92.86% to 88.36%) through robust semantic pattern recognition. Multi-seed experiments demonstrate that all architectures achieve 15% higher AUROC on artistic content (0.901-0.907) versus photorealistic images (0.747-0.759), confirming that semantic patterns provide fundamentally more reliable detection cues than forensic artifacts. Our hybrid approach achieves balanced cross-domain performance: 91.4% accuracy on tiny-genimage photos, 89.7% on AiArtData art/graphics, and 98.3% (competitive) on CIFAKE. Fixed-threshold evaluation eliminates retuning inflation, reveals genuine robustness gaps, and yields actionable deployment guidance: prefer CNNs for clean photo verification, ViTs for compressed content, and hybrids for art/graphics screening.","short_abstract":"AI image generators create both photorealistic images and stylized art, necessitating robust detectors that maintain performance under common post-processing transformations (JPEG compression, blur, downscaling). Existing methods optimize single metrics without addressing deployment-critical factors such as operating p...","url_abs":"https://arxiv.org/abs/2512.21512","url_pdf":"https://arxiv.org/pdf/2512.21512v1","authors":"[\"Md Ashik Khan\",\"Arafat Alam Jion\"]","published":"2025-12-25T05:19:09Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Convolutional Neural Network\"]","has_code":false}
