{"ID":2891174,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.17192","arxiv_id":"2507.17192","title":"Vec2Face+ for Face Dataset Generation","abstract":"When synthesizing identities as face recognition training data, it is generally believed that large inter-class separability and intra-class attribute variation are essential for synthesizing a quality dataset. % This belief is generally correct, and this is what we aim for. However, when increasing intra-class variation, existing methods overlook the necessity of maintaining intra-class identity consistency. % To address this and generate high-quality face training data, we propose Vec2Face+, a generative model that creates images directly from image features and allows for continuous and easy control of face identities and attributes. Using Vec2Face+, we obtain datasets with proper inter-class separability and intra-class variation and identity consistency using three strategies: 1) we sample vectors sufficiently different from others to generate well-separated identities; 2) we propose an AttrOP algorithm for increasing general attribute variations; 3) we propose LoRA-based pose control for generating images with profile head poses, which is more efficient and identity-preserving than AttrOP. % Our system generates VFace10K, a synthetic face dataset with 10K identities, which allows an FR model to achieve state-of-the-art accuracy on seven real-world test sets. Scaling the size to 4M and 12M images, the corresponding VFace100K and VFace300K datasets yield higher accuracy than the real-world training dataset, CASIA-WebFace, on five real-world test sets. This is the first time a synthetic dataset beats the CASIA-WebFace in average accuracy. In addition, we find that only 1 out of 11 synthetic datasets outperforms random guessing (\\emph{i.e., 50\\%}) in twin verification and that models trained with synthetic identities are more biased than those trained with real identities. Both are important aspects for future investigation. Code is available at https://github.com/HaiyuWu/Vec2Face_plus","short_abstract":"When synthesizing identities as face recognition training data, it is generally believed that large inter-class separability and intra-class attribute variation are essential for synthesizing a quality dataset. % This belief is generally correct, and this is what we aim for. However, when increasing intra-class variati...","url_abs":"https://arxiv.org/abs/2507.17192","url_pdf":"https://arxiv.org/pdf/2507.17192v2","authors":"[\"Haiyu Wu\",\"Jaskirat Singh\",\"Sicong Tian\",\"Liang Zheng\",\"Kevin W. Bowyer\"]","published":"2025-07-23T04:34:56Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"LoRA\"]","has_code":false,"code_links":[{"ID":611857,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2891174,"paper_url":"https://arxiv.org/abs/2507.17192","paper_title":"Vec2Face+ for Face Dataset Generation","repo_url":"https://github.com/HaiyuWu/Vec2Face_plus","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
