{"ID":2851820,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.19758","arxiv_id":"2510.19758","title":"Top-P Masking for Cross Language Information Retrieval","abstract":"Top-K masking schemes have been proposed as a method to promote sparse representations in Information Retrieval (IR) tasks, as a simple alternative to Floating Point Operations per Second (FLOPS) regularization. Algorithms such as Bilingual Lexical and Document Expansion Model (BLADE), adopt this approach as a post-processing stage. We propose using Top-P Dynamic Masking similar to Nucleus Sampling in Large Language Models, and demonstrate better performance than Top-K masking. Specifically, we evaluate our methods in the domain of Cross Language Information Retrieval (CLIR)","short_abstract":"Top-K masking schemes have been proposed as a method to promote sparse representations in Information Retrieval (IR) tasks, as a simple alternative to Floating Point Operations per Second (FLOPS) regularization. Algorithms such as Bilingual Lexical and Document Expansion Model (BLADE), adopt this approach as a post-pro...","url_abs":"https://arxiv.org/abs/2510.19758","url_pdf":"https://arxiv.org/pdf/2510.19758v1","authors":"[\"Joseph Casale\",\"Andrew Silverschotz\",\"Joseph DeSimone\"]","published":"2025-10-22T16:47:42Z","proceeding":"cs.IR","tasks":"[\"cs.IR\"]","methods":"[\"Language Model\"]","has_code":false}
