{"ID":2841892,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.11821","arxiv_id":"2511.11821","title":"Scaling Open-Weight Large Language Models for Hydropower Regulatory Information Extraction: A Systematic Analysis","abstract":"Information extraction from regulatory documents using large language models presents critical trade-offs between performance and computational resources. We evaluated seven open-weight models (0.6B-70B parameters) on hydropower licensing documentation to provide empirical deployment guidance. Our analysis identified a pronounced 14B parameter threshold where validation methods transition from ineffective (F1 $\u003c$ 0.15) to viable (F1 = 0.64). Consumer-deployable models achieve 64\\% F1 through appropriate validation, while smaller models plateau at 51\\%. Large-scale models approach 77\\% F1 but require enterprise infrastructure. We identified systematic hallucination patterns where perfect recall indicates extraction failure rather than success in smaller models. Our findings establish the first comprehensive resource-performance mapping for open-weight information extraction in regulatory contexts, enabling evidence-based model selection. These results provide immediate value for hydropower compliance while contributing insights into parameter scaling effects that generalize across information extraction tasks.","short_abstract":"Information extraction from regulatory documents using large language models presents critical trade-offs between performance and computational resources. We evaluated seven open-weight models (0.6B-70B parameters) on hydropower licensing documentation to provide empirical deployment guidance. Our analysis identified a...","url_abs":"https://arxiv.org/abs/2511.11821","url_pdf":"https://arxiv.org/pdf/2511.11821v1","authors":"[\"Hong-Jun Yoon\",\"Faisal Ashraf\",\"Thomas A. Ruggles\",\"Debjani Singh\"]","published":"2025-11-14T19:23:25Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}
