{"ID":2864319,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.23860","arxiv_id":"2509.23860","title":"GSID: Generative Semantic Indexing for E-Commerce Product Understanding","abstract":"Structured representation of product information is a major bottleneck for the efficiency of e-commerce platforms, especially in second-hand ecommerce platforms. Currently, most product information are organized based on manually curated product categories and attributes, which often fail to adequately cover long-tail products and do not align well with buyer preference. To address these problems, we propose \\textbf{G}enerative \\textbf{S}emantic \\textbf{I}n\\textbf{D}exings (GSID), a data-driven approach to generate product structured representations. GSID consists of two key components: (1) Pre-training on unstructured product metadata to learn in-domain semantic embeddings, and (2) Generating more effective semantic codes tailored for downstream product-centric applications. Extensive experiments are conducted to validate the effectiveness of GSID, and it has been successfully deployed on the real-world e-commerce platform, achieving promising results on product understanding and other downstream tasks.","short_abstract":"Structured representation of product information is a major bottleneck for the efficiency of e-commerce platforms, especially in second-hand ecommerce platforms. Currently, most product information are organized based on manually curated product categories and attributes, which often fail to adequately cover long-tail...","url_abs":"https://arxiv.org/abs/2509.23860","url_pdf":"https://arxiv.org/pdf/2509.23860v1","authors":"[\"Haiyang Yang\",\"Qinye Xie\",\"Qingheng Zhang\",\"Liyu Chen\",\"Huike Zou\",\"Chengbao Lian\",\"Shuguang Han\",\"Fei Huang\",\"Jufeng Chen\",\"Bo Zheng\"]","published":"2025-09-28T12:58:05Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.AI\"]","methods":"[\"Generative Adversarial Network\"]","has_code":false}
