{"ID":2823561,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.00510","arxiv_id":"2601.00510","title":"A Chain-of-Thought Approach to Semantic Query Categorization in e-Commerce Taxonomies","abstract":"Search in e-Commerce is powered at the core by a structured representation of the inventory, often formulated as a category taxonomy. An important capability in e-Commerce with hierarchical taxonomies is to select a set of relevant leaf categories that are semantically aligned with a given user query. In this scope, we address a fundamental problem of search query categorization in real-world e-Commerce taxonomies. A correct categorization of a query not only provides a way to zoom into the correct inventory space, but opens the door to multiple intent understanding capabilities for a query. A practical and accurate solution to this problem has many applications in e-commerce, including constraining retrieved items and improving the relevance of the search results. For this task, we explore a novel Chain-of-Thought (CoT) paradigm that combines simple tree-search with LLM semantic scoring. Assessing its classification performance on human-judged query-category pairs, relevance tests, and LLM-based reference methods, we find that the CoT approach performs better than a benchmark that uses embedding-based query category predictions. We show how the CoT approach can detect problems within a hierarchical taxonomy. Finally, we also propose LLM-based approaches for query-categorization of the same spirit, but which scale better at the range of millions of queries.","short_abstract":"Search in e-Commerce is powered at the core by a structured representation of the inventory, often formulated as a category taxonomy. An important capability in e-Commerce with hierarchical taxonomies is to select a set of relevant leaf categories that are semantically aligned with a given user query. In this scope, we...","url_abs":"https://arxiv.org/abs/2601.00510","url_pdf":"https://arxiv.org/pdf/2601.00510v1","authors":"[\"Jetlir Duraj\",\"Ishita Khan\",\"Kilian Merkelbach\",\"Mehran Elyasi\"]","published":"2026-01-01T23:36:13Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.CL\"]","methods":"[\"Large Language Model\"]","has_code":false}
