{"ID":2853131,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.16797","arxiv_id":"2510.16797","title":"MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning","abstract":"We introduce MOSAIC (Masked Objective with Selective Adaptation for In-domain Contrastive learning), a multi-stage framework for domain adaptation of text embedding models that incorporates joint domain-specific masked supervision. Our approach addresses the challenges of adapting large-scale general-domain text embedding models to specialized domains. By jointly optimizing masked language modeling (MLM) and contrastive objectives within a unified training pipeline, our method enables effective learning of domain-relevant representations while preserving the robust semantic discrimination properties of the original model. We empirically validate our approach on both high-resource and low-resource domains, achieving improvements up to 13.4% in NDCG@10 (Normalized Discounted Cumulative Gain) over strong general-domain baselines. Comprehensive ablation studies further demonstrate the effectiveness of each component, highlighting the importance of balanced joint supervision and staged adaptation.","short_abstract":"We introduce MOSAIC (Masked Objective with Selective Adaptation for In-domain Contrastive learning), a multi-stage framework for domain adaptation of text embedding models that incorporates joint domain-specific masked supervision. Our approach addresses the challenges of adapting large-scale general-domain text embedd...","url_abs":"https://arxiv.org/abs/2510.16797","url_pdf":"https://arxiv.org/pdf/2510.16797v2","authors":"[\"Vera Pavlova\",\"Mohammed Makhlouf\"]","published":"2025-10-19T11:24:03Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}