{"ID":2841354,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.12249","arxiv_id":"2511.12249","title":"ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations","abstract":"Recent advances in contextualized word embeddings have greatly improved semantic tasks such as Word Sense Disambiguation (WSD) and contextual similarity, but most progress has been limited to high-resource languages like English. Vietnamese, in contrast, still lacks robust models and evaluation resources for fine-grained semantic understanding. In this paper, we present ViConBERT, a novel framework for learning Vietnamese contextualized embeddings that integrates contrastive learning (SimCLR) and gloss-based distillation to better capture word meaning. We also introduce ViConWSD, the first large-scale synthetic dataset for evaluating semantic understanding in Vietnamese, covering both WSD and contextual similarity. Experimental results show that ViConBERT outperforms strong baselines on WSD (F1 = 0.87) and achieves competitive performance on ViCon (AP = 0.88) and ViSim-400 (Spearman's rho = 0.60), demonstrating its effectiveness in modeling both discrete senses and graded semantic relations. Our code, models, and data are available at https://github.com/tkhangg0910/ViConBERT","short_abstract":"Recent advances in contextualized word embeddings have greatly improved semantic tasks such as Word Sense Disambiguation (WSD) and contextual similarity, but most progress has been limited to high-resource languages like English. Vietnamese, in contrast, still lacks robust models and evaluation resources for fine-grain...","url_abs":"https://arxiv.org/abs/2511.12249","url_pdf":"https://arxiv.org/pdf/2511.12249v1","authors":"[\"Khang T. Huynh\",\"Dung H. Nguyen\",\"Binh T. Nguyen\"]","published":"2025-11-15T15:11:52Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false,"code_links":[{"ID":607050,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2841354,"paper_url":"https://arxiv.org/abs/2511.12249","paper_title":"ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations","repo_url":"https://github.com/tkhangg0910/ViConBERT","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
