{"ID":2855243,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.13933","arxiv_id":"2510.13933","title":"Image-based Facial Rig Inversion","abstract":"We present an image-based rig inversion framework that leverages two modalities: RGB appearance and RGB-encoded normal maps. Each modality is processed by an independent Hiera transformer backbone, and the extracted features are fused to regress 102 rig parameters derived from the Facial Action Coding System (FACS). Experiments on synthetic and scanned datasets demonstrate that the method generalizes to scanned data, producing faithful reconstructions.","short_abstract":"We present an image-based rig inversion framework that leverages two modalities: RGB appearance and RGB-encoded normal maps. Each modality is processed by an independent Hiera transformer backbone, and the extracted features are fused to regress 102 rig parameters derived from the Facial Action Coding System (FACS). Ex...","url_abs":"https://arxiv.org/abs/2510.13933","url_pdf":"https://arxiv.org/pdf/2510.13933v1","authors":"[\"Tianxiang Yang\",\"Marco Volino\",\"Armin Mustafa\",\"Greg Maguire\",\"Robert Kosk\"]","published":"2025-10-15T14:32:44Z","proceeding":"eess.IV","tasks":"[\"eess.IV\"]","methods":"[\"Transformer\"]","has_code":false}