{"ID":2867451,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.17289","arxiv_id":"2509.17289","title":"Automated Knowledge Graph Construction using Large Language Models and Sentence Complexity Modelling","abstract":"We introduce CoDe-KG, an open-source, end-to-end pipeline for extracting sentence-level knowledge graphs by combining robust coreference resolution with syntactic sentence decomposition. Using our model, we contribute a dataset of over 150,000 knowledge triples, which is open source. We also contribute a training corpus of 7248 rows for sentence complexity, 190 rows of gold human annotations for co-reference resolution using open source lung-cancer abstracts from PubMed, 900 rows of gold human annotations for sentence conversion policies, and 398 triples of gold human annotations. We systematically select optimal prompt-model pairs across five complexity categories, showing that hybrid chain-of-thought and few-shot prompting yields up to 99.8% exact-match accuracy on sentence simplification. On relation extraction (RE), our pipeline achieves 65.8% macro-F1 on REBEL, an 8-point gain over the prior state of the art, and 75.7% micro-F1 on WebNLG2, while matching or exceeding performance on Wiki-NRE and CaRB. Ablation studies demonstrate that integrating coreference and decomposition increases recall on rare relations by over 20%. Code and dataset are available at https://github.com/KaushikMahmud/CoDe-KG_EMNLP_2025","short_abstract":"We introduce CoDe-KG, an open-source, end-to-end pipeline for extracting sentence-level knowledge graphs by combining robust coreference resolution with syntactic sentence decomposition. Using our model, we contribute a dataset of over 150,000 knowledge triples, which is open source. We also contribute a training corpu...","url_abs":"https://arxiv.org/abs/2509.17289","url_pdf":"https://arxiv.org/pdf/2509.17289v1","authors":"[\"Sydney Anuyah\",\"Mehedi Mahmud Kaushik\",\"Krishna Dwarampudi\",\"Rakesh Shiradkar\",\"Arjan Durresi\",\"Sunandan Chakraborty\"]","published":"2025-09-22T00:01:50Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":609463,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2867451,"paper_url":"https://arxiv.org/abs/2509.17289","paper_title":"Automated Knowledge Graph Construction using Large Language Models and Sentence Complexity Modelling","repo_url":"https://github.com/KaushikMahmud/CoDe-KG_EMNLP_2025","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
