{"ID":2834188,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.04125","arxiv_id":"2512.04125","title":"ASCIIBench: Evaluating Language-Model-Based Understanding of Visually-Oriented Text","abstract":"Large language models (LLMs) have demonstrated several emergent behaviors with scale, including reasoning and fluency in long-form text generation. However, they continue to struggle with tasks requiring precise spatial and positional reasoning. ASCII art, a symbolic medium where characters encode structure and form, provides a unique probe of this limitation. We introduce ASCIIBench, a novel benchmark for evaluating both the generation and classification of ASCII-text images. ASCIIBench consists of a filtered dataset of 5,315 class-labeled ASCII images and is, to our knowledge, the first publicly available benchmark of its kind. Alongside the dataset, we release weights for a fine-tuned CLIP model adapted to capture ASCII structure, enabling the evaluation of LLM-generated ASCII art. Our analysis shows that cosine similarity over CLIP embeddings fails to separate most ASCII categories, yielding chance-level performance even for low-variance classes. In contrast, classes with high internal mean similarity exhibit clear discriminability, revealing that the bottleneck lies in representation rather than generational variance. These findings position ASCII art as a stress test for multimodal representations and motivate the development of new embedding methods or evaluation metrics tailored to symbolic visual modalities. All resources are available at https://github.com/ASCIIBench/ASCIIBench.","short_abstract":"Large language models (LLMs) have demonstrated several emergent behaviors with scale, including reasoning and fluency in long-form text generation. However, they continue to struggle with tasks requiring precise spatial and positional reasoning. ASCII art, a symbolic medium where characters encode structure and form, p...","url_abs":"https://arxiv.org/abs/2512.04125","url_pdf":"https://arxiv.org/pdf/2512.04125v1","authors":"[\"Kerry Luo\",\"Michael Fu\",\"Joshua Peguero\",\"Husnain Malik\",\"Anvay Patil\",\"Joyce Lin\",\"Megan Van Overborg\",\"Ryan Sarmiento\",\"Kevin Zhu\"]","published":"2025-12-02T20:55:42Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":606390,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2834188,"paper_url":"https://arxiv.org/abs/2512.04125","paper_title":"ASCIIBench: Evaluating Language-Model-Based Understanding of Visually-Oriented Text","repo_url":"https://github.com/ASCIIBench/ASCIIBench","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}