{"ID":2843484,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.08376","arxiv_id":"2511.08376","title":"TurkEmbed: Turkish Embedding Model on NLI \u0026 STS Tasks","abstract":"This paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. Current Turkish embedding models often rely on machine-translated datasets, potentially limiting their accuracy and semantic understanding. TurkEmbed utilizes a combination of diverse datasets and advanced training techniques, including matryoshka representation learning, to achieve more robust and accurate embeddings. This approach enables the model to adapt to various resource-constrained environments, offering faster encoding capabilities. Our evaluation on the Turkish STS-b-TR dataset, using Pearson and Spearman correlation metrics, demonstrates significant improvements in semantic similarity tasks. Furthermore, TurkEmbed surpasses the current state-of-the-art model, Emrecan, on All-NLI-TR and STS-b-TR benchmarks, achieving a 1-4\\% improvement. TurkEmbed promises to enhance the Turkish NLP ecosystem by providing a more nuanced understanding of language and facilitating advancements in downstream applications.","short_abstract":"This paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. Current Turkish embedding models often rely on machine-translated datasets, potentially limiting their accuracy...","url_abs":"https://arxiv.org/abs/2511.08376","url_pdf":"https://arxiv.org/pdf/2511.08376v1","authors":"[\"Özay Ezerceli\",\"Gizem Gümüşçekiçci\",\"Tuğba Erkoç\",\"Berke Özenç\"]","published":"2025-11-11T15:54:52Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.IR\"]","methods":"[]","has_code":false}
