{"ID":2833005,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.04814","arxiv_id":"2512.04814","title":"Shared Multi-modal Embedding Space for Face-Voice Association","abstract":"The FAME 2026 challenge comprises two demanding tasks: training face-voice associations combined with a multilingual setting that includes testing on languages on which the model was not trained. Our approach consists of separate uni-modal processing pipelines with general face and voice feature extraction, complemented by additional age-gender feature extraction to support prediction. The resulting single-modal features are projected into a shared embedding space and trained with an Adaptive Angular Margin (AAM) loss. Our approach achieved first place in the FAME 2026 challenge, with an average Equal-Error Rate (EER) of 23.99%.","short_abstract":"The FAME 2026 challenge comprises two demanding tasks: training face-voice associations combined with a multilingual setting that includes testing on languages on which the model was not trained. Our approach consists of separate uni-modal processing pipelines with general face and voice feature extraction, complemente...","url_abs":"https://arxiv.org/abs/2512.04814","url_pdf":"https://arxiv.org/pdf/2512.04814v1","authors":"[\"Christopher Simic\",\"Korbinian Riedhammer\",\"Tobias Bocklet\"]","published":"2025-12-04T14:04:15Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.CV\"]","methods":"[]","has_code":false}