{"ID":2825308,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.21799","arxiv_id":"2512.21799","title":"KG20C \u0026 KG20C-QA: Scholarly Knowledge Graph Benchmarks for Link Prediction and Question Answering","abstract":"In this paper, we present KG20C and KG20C-QA, two curated datasets for advancing question answering (QA) research on scholarly data. KG20C is a high-quality scholarly knowledge graph constructed from the Microsoft Academic Graph through targeted selection of venues, quality-based filtering, and schema definition. Although KG20C has been available online in non-peer-reviewed sources such as GitHub repository, this paper provides the first formal, peer-reviewed description of the dataset, including clear documentation of its construction and specifications. KG20C-QA is built upon KG20C to support QA tasks on scholarly data. We define a set of QA templates that convert graph triples into natural language question--answer pairs, producing a benchmark that can be used both with graph-based models such as knowledge graph embeddings and with text-based models such as large language models. We benchmark standard knowledge graph embedding methods on KG20C-QA, analyze performance across relation types, and provide reproducible evaluation protocols. By officially releasing these datasets with thorough documentation, we aim to contribute a reusable, extensible resource for the research community, enabling future work in QA, reasoning, and knowledge-driven applications in the scholarly domain. The full datasets will be released at https://github.com/tranhungnghiep/KG20C/ upon paper publication.","short_abstract":"In this paper, we present KG20C and KG20C-QA, two curated datasets for advancing question answering (QA) research on scholarly data. KG20C is a high-quality scholarly knowledge graph constructed from the Microsoft Academic Graph through targeted selection of venues, quality-based filtering, and schema definition. Altho...","url_abs":"https://arxiv.org/abs/2512.21799","url_pdf":"https://arxiv.org/pdf/2512.21799v2","authors":"[\"Hung-Nghiep Tran\",\"Atsuhiro Takasu\"]","published":"2025-12-25T22:29:54Z","proceeding":"cs.IR","tasks":"[\"cs.IR\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":605652,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2825308,"paper_url":"https://arxiv.org/abs/2512.21799","paper_title":"KG20C \u0026 KG20C-QA: Scholarly Knowledge Graph Benchmarks for Link Prediction and Question Answering","repo_url":"https://github.com/tranhungnghiep/KG20C","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
