{"ID":2875521,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.02466","arxiv_id":"2509.02466","title":"TeRA: Rethinking Text-guided Realistic 3D Avatar Generation","abstract":"In this paper, we rethink text-to-avatar generative models by proposing TeRA, a more efficient and effective framework than the previous SDS-based models and general large 3D generative models. Our approach employs a two-stage training strategy for learning a native 3D avatar generative model. Initially, we distill a decoder to derive a structured latent space from a large human reconstruction model. Subsequently, a text-controlled latent diffusion model is trained to generate photorealistic 3D human avatars within this latent space. TeRA enhances the model performance by eliminating slow iterative optimization and enables text-based partial customization through a structured 3D human representation. Experiments have proven our approach's superiority over previous text-to-avatar generative models in subjective and objective evaluation.","short_abstract":"In this paper, we rethink text-to-avatar generative models by proposing TeRA, a more efficient and effective framework than the previous SDS-based models and general large 3D generative models. Our approach employs a two-stage training strategy for learning a native 3D avatar generative model. Initially, we distill a d...","url_abs":"https://arxiv.org/abs/2509.02466","url_pdf":"https://arxiv.org/pdf/2509.02466v1","authors":"[\"Yanwen Wang\",\"Yiyu Zhuang\",\"Jiawei Zhang\",\"Li Wang\",\"Yifei Zeng\",\"Xun Cao\",\"Xinxin Zuo\",\"Hao Zhu\"]","published":"2025-09-02T16:20:20Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}