{"ID":2869371,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.15283","arxiv_id":"2509.15283","title":"Evaluating the Limitations of Local LLMs in Solving Complex Programming Challenges","abstract":"This study examines the performance of today's open-source, locally hosted large-language models (LLMs) in handling complex competitive programming tasks with extended problem descriptions and contexts. Building on the original Framework for AI-driven Code Generation Evaluation (FACE), the authors retrofit the pipeline to work entirely offline through the Ollama runtime, collapsing FACE's sprawling per-problem directory tree into a handful of consolidated JSON files, and adding robust checkpointing so multi-day runs can resume after failures. The enhanced framework generates, submits, and records solutions for the full Kattis corpus of 3,589 problems across eight code-oriented models ranging from 6.7-9 billion parameters. The submission results show that the overall pass@1 accuracy is modest for the local models, with the best models performing at approximately half the acceptance rate of the proprietary models, Gemini 1.5 and ChatGPT-4. These findings expose a persistent gap between private, cost-controlled LLM deployments and state-of-the-art proprietary services, yet also highlight the rapid progress of open models and the practical benefits of an evaluation workflow that organizations can replicate on in-house hardware.","short_abstract":"This study examines the performance of today's open-source, locally hosted large-language models (LLMs) in handling complex competitive programming tasks with extended problem descriptions and contexts. Building on the original Framework for AI-driven Code Generation Evaluation (FACE), the authors retrofit the pipeline...","url_abs":"https://arxiv.org/abs/2509.15283","url_pdf":"https://arxiv.org/pdf/2509.15283v1","authors":"[\"Kadin Matotek\",\"Heather Cassel\",\"Md Amiruzzaman\",\"Linh B. Ngo\"]","published":"2025-09-18T14:13:30Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.AI\",\"cs.LG\",\"cs.PL\"]","methods":"[\"Large Language Model\",\"Language Model\",\"Generative Adversarial Network\"]","has_code":false}
