{"ID":2831040,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.14717","arxiv_id":"2512.14717","title":"Is GPT-OSS All You Need? Benchmarking Large Language Models for Financial Intelligence and the Surprising Efficiency Paradox","abstract":"The rapid adoption of large language models in financial services necessitates rigorous evaluation frameworks to assess their performance, efficiency, and practical applicability. This paper conducts a comprehensive evaluation of the GPT-OSS model family alongside contemporary LLMs across ten diverse financial NLP tasks. Through extensive experimentation on 120B and 20B parameter variants of GPT-OSS, we reveal a counterintuitive finding: the smaller GPT-OSS-20B model achieves comparable accuracy (65.1% vs 66.5%) while demonstrating superior computational efficiency with 198.4 Token Efficiency Score and 159.80 tokens per second processing speed [1]. Our evaluation encompasses sentiment analysis, question answering, and entity recognition tasks using real-world financial datasets including Financial PhraseBank, FiQA-SA, and FLARE FINERORD. We introduce novel efficiency metrics that capture the trade-off between model performance and resource utilization, providing critical insights for deployment decisions in production environments. The benchmark reveals that GPT-OSS models consistently outperform larger competitors including Qwen3-235B, challenging the prevailing assumption that model scale directly correlates with task performance [2]. Our findings demonstrate that architectural innovations and training strategies in GPT-OSS enable smaller models to achieve competitive performance with significantly reduced computational overhead, offering a pathway toward sustainable and cost-effective deployment of LLMs in financial applications.","short_abstract":"The rapid adoption of large language models in financial services necessitates rigorous evaluation frameworks to assess their performance, efficiency, and practical applicability. This paper conducts a comprehensive evaluation of the GPT-OSS model family alongside contemporary LLMs across ten diverse financial NLP task...","url_abs":"https://arxiv.org/abs/2512.14717","url_pdf":"https://arxiv.org/pdf/2512.14717v1","authors":"[\"Ziqian Bi\",\"Danyang Zhang\",\"Junhao Song\",\"Chiung-Yi Tseng\"]","published":"2025-12-09T06:07:19Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}