{"ID":2852039,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.18288","arxiv_id":"2510.18288","title":"BrailleLLM: Braille Instruction Tuning with Large Language Models for Braille Domain Tasks","abstract":"Braille plays a vital role in education and information accessibility for visually impaired individuals. However, Braille information processing faces challenges such as data scarcity and ambiguities in mixed-text contexts. We construct English and Chinese Braille Mixed Datasets (EBMD/CBMD) with mathematical formulas to support diverse Braille domain research, and propose a syntax tree-based augmentation method tailored for Braille data. To address the underperformance of traditional fine-tuning methods in Braille-related tasks, we investigate Braille Knowledge-Based Fine-Tuning (BKFT), which reduces the learning difficulty of Braille contextual features. BrailleLLM employs BKFT via instruction tuning to achieve unified Braille translation, formula-to-Braille conversion, and mixed-text translation. Experiments demonstrate that BKFT achieves significant performance improvements over conventional fine-tuning in Braille translation scenarios. Our open-sourced datasets and methodologies establish a foundation for low-resource multilingual Braille research.","short_abstract":"Braille plays a vital role in education and information accessibility for visually impaired individuals. However, Braille information processing faces challenges such as data scarcity and ambiguities in mixed-text contexts. We construct English and Chinese Braille Mixed Datasets (EBMD/CBMD) with mathematical formulas t...","url_abs":"https://arxiv.org/abs/2510.18288","url_pdf":"https://arxiv.org/pdf/2510.18288v1","authors":"[\"Tianyuan Huang\",\"Zepeng Zhu\",\"Hangdi Xing\",\"Zirui Shao\",\"Zhi Yu\",\"Chaoxiong Yang\",\"Jiaxian He\",\"Xiaozhong Liu\",\"Jiajun Bu\"]","published":"2025-10-21T04:33:05Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
