{"ID":2880765,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.13957","arxiv_id":"2508.13957","title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","abstract":"Face Image Quality Assessment (FIQA) aims to predict the utility of a face image for face recognition (FR) systems. State-of-the-art FIQA methods mainly rely on convolutional neural networks (CNNs), leaving the potential of Vision Transformer (ViT) architectures underexplored. This work proposes ViT-FIQA, a novel approach that extends standard ViT backbones, originally optimized for FR, through a learnable quality token designed to predict a scalar utility score for any given face image. The learnable quality token is concatenated with the standard image patch tokens, and the whole sequence is processed via global self-attention by the ViT encoders to aggregate contextual information across all patches. At the output of the backbone, ViT-FIQA branches into two heads: (1) the patch tokens are passed through a fully connected layer to learn discriminative face representations via a margin-penalty softmax loss, and (2) the quality token is fed into a regression head to learn to predict the face sample's utility. Extensive experiments on challenging benchmarks and several FR models, including both CNN- and ViT-based architectures, demonstrate that ViT-FIQA consistently achieves top-tier performance. These results underscore the effectiveness of transformer-based architectures in modeling face image utility and highlight the potential of ViTs as a scalable foundation for future FIQA research https://cutt.ly/irHlzXUC.","short_abstract":"Face Image Quality Assessment (FIQA) aims to predict the utility of a face image for face recognition (FR) systems. State-of-the-art FIQA methods mainly rely on convolutional neural networks (CNNs), leaving the potential of Vision Transformer (ViT) architectures underexplored. This work proposes ViT-FIQA, a novel appro...","url_abs":"https://arxiv.org/abs/2508.13957","url_pdf":"https://arxiv.org/pdf/2508.13957v3","authors":"[\"Andrea Atzori\",\"Fadi Boutros\",\"Naser Damer\"]","published":"2025-08-19T15:50:07Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Vision Transformer\",\"Transformer\",\"Convolutional Neural Network\"]","project_urls":"[\"https://cutt.ly/irHlzXUC\"]","has_code":false,"code_links":[{"ID":610710,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/atzoriandrea/ViT-FIQA-Assessing-Face-Image-Quality-using-Vision-Transformers","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610711,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/atzoriandrea/ViT-FIQA-Assessing-Face-Image-Quality-using-Vision-Transformers.git","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610712,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/features/copilot","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610713,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/features/spark","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610714,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/features/models","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610715,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/features/actions","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610716,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/features/codespaces","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610717,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/features/issues","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610718,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/features/code-review","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610719,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/security/advanced-security","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610720,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/enterprise/startups","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610721,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/solutions/industry","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":610722,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880765,"paper_url":"https://arxiv.org/abs/2508.13957","paper_title":"ViT-FIQA: Assessing Face Image Quality using Vision Transformers","repo_url":"https://github.com/solutions/use-case","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}