{"ID":2829216,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.12507","arxiv_id":"2512.12507","title":"ATLAS: Automated Tree-based Language Analysis System for C and C++ source programs","abstract":"Analyzing non-compilable C/C++ submodules without a resolved build environment remains a critical bottleneck for industrial software evolution. Traditional static analysis tools often fail in these scenarios due to their reliance on successful compilation, while Large Language Models (LLMs) lack the structural context necessary to reason about complex program logic. We introduce ATLAS, a Python-based CLI that generates unified multi-view representations for large-scale C/C++ projects with high accuracy, achieving success rates up to 96.80% for CFGs and 91.38% for DFGs. ATLAS is characterized by: (i) inter-procedural, type-aware analysis across function boundaries; (ii) support for both full and partial analysis of non-compilable projects; (iii) graph optimizations such as variable collapsing and node blacklisting; and (iv) synchronized multi-view graphs that align syntax, execution paths, and data-flow logic. Evaluating ATLAS with DeepSeek V3.2 for automated test generation demonstrates a 34.71% increase in line coverage and 32.66% in branch coverage, matching or exceeding the performance of the symbolic execution tool KLEE on complex projects. With polynomial scalability, ATLAS provides a robust infrastructure for generating the information-dense datasets required by next-generation, graph-aware ML4SE models. Video demonstration: https://youtu.be/QGuJZhj9CTA Tool github repository: https://github.com/jaid-monwar/ATLAS-multi-view-code-representation-tool.git","short_abstract":"Analyzing non-compilable C/C++ submodules without a resolved build environment remains a critical bottleneck for industrial software evolution. Traditional static analysis tools often fail in these scenarios due to their reliance on successful compilation, while Large Language Models (LLMs) lack the structural context...","url_abs":"https://arxiv.org/abs/2512.12507","url_pdf":"https://arxiv.org/pdf/2512.12507v3","authors":"[\"Jaid Monwar Chowdhury\",\"Ahmad Farhan Shahriar Chowdhury\",\"Humayra Binte Monwar\",\"Mahmuda Naznin\"]","published":"2025-12-14T01:11:11Z","proceeding":"cs.SE","tasks":"[\"cs.SE\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":605940,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2829216,"paper_url":"https://arxiv.org/abs/2512.12507","paper_title":"ATLAS: Automated Tree-based Language Analysis System for C and C++ source programs","repo_url":"https://github.com/jaid-monwar/ATLAS-multi-view-code-representation-tool.git","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
