{"ID":2850140,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.22829","arxiv_id":"2510.22829","title":"LLM-based Fusion of Multi-modal Features for Commercial Memorability Prediction","abstract":"This paper addresses the prediction of commercial (brand) memorability as part of \"Subtask 2: Commercial/Ad Memorability\" within the \"Memorability: Predicting movie and commercial memorability\" task at the MediaEval 2025 workshop competition. We propose a multimodal fusion system with a Gemma-3 LLM backbone that integrates pre-computed visual (ViT) and textual (E5) features by multi-modal projections. The model is adapted using Low-Rank Adaptation (LoRA). A heavily-tuned ensemble of gradient boosted trees serves as a baseline. A key contribution is the use of LLM-generated rationale prompts, grounded in expert-derived aspects of memorability, to guide the fusion model. The results demonstrate that the LLM-based system exhibits greater robustness and generalization performance on the final test set, compared to the baseline. The paper's codebase can be found at https://github.com/dsgt-arc/mediaeval-2025-memorability","short_abstract":"This paper addresses the prediction of commercial (brand) memorability as part of \"Subtask 2: Commercial/Ad Memorability\" within the \"Memorability: Predicting movie and commercial memorability\" task at the MediaEval 2025 workshop competition. We propose a multimodal fusion system with a Gemma-3 LLM backbone that integr...","url_abs":"https://arxiv.org/abs/2510.22829","url_pdf":"https://arxiv.org/pdf/2510.22829v1","authors":"[\"Aleksandar Pramov\"]","published":"2025-10-26T20:51:52Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.MM\"]","methods":"[\"Large Language Model\",\"LoRA\"]","has_code":false,"code_links":[{"ID":607767,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2850140,"paper_url":"https://arxiv.org/abs/2510.22829","paper_title":"LLM-based Fusion of Multi-modal Features for Commercial Memorability Prediction","repo_url":"https://github.com/dsgt-arc/mediaeval-2025-memorability","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
