{"ID":2879430,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.20119","arxiv_id":"2508.20119","title":"LLM Agents for Generating Microservice-based Applications: how complex is your specification?","abstract":"In this paper we evaluate the capabilities of LLM Agents in generating code for real-world problems. Specifically, we explore code synthesis for microservice-based applications, a widely used architectural pattern for building applications. We define a standard template for specifying these applications, and we propose a metric for scoring the difficulty of a specification. The higher the score, the more difficult it is to generate code for the specification. Our experimental results show that agents using strong LLMs (like GPT-3o-mini) do fairly well on medium difficulty specifications but do poorly on those of higher difficulty levels. This is due to more intricate business logic, a greater use of external services, database integration and inclusion of non-functional capabilities such as authentication. We analyzed the errors in LLM-synthesized code and report on the key challenges LLM Agents face in generating code for these specifications. Finally, we show that using a fine-grained approach to code generation improves the correctness of the generated code.","short_abstract":"In this paper we evaluate the capabilities of LLM Agents in generating code for real-world problems. Specifically, we explore code synthesis for microservice-based applications, a widely used architectural pattern for building applications. We define a standard template for specifying these applications, and we propose...","url_abs":"https://arxiv.org/abs/2508.20119","url_pdf":"https://arxiv.org/pdf/2508.20119v2","authors":"[\"Daniel M. Yellin\"]","published":"2025-08-22T12:34:22Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.LG\"]","methods":"[\"Large Language Model\"]","has_code":false}
