{"ID":3004888,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T11:10:57.854545281Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03504","arxiv_id":"2606.03504","title":"BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language","abstract":"We present BaltiVoice, a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan, with no prior publicly available ASR resources. The corpus contains 10,060 validated utterances in native Nastaliq script, derived from Mozilla Common Voice recordings. We fine-tune OpenAI Whisper-small on this corpus and report a Word Error Rate (WER) of 30.07% on a held-out validation set of 538 utterances, down from a measured zero-shot baseline of 182.18% for Whisper-small on Balti. The dataset, fine-tuned model, and a live transcription demo are publicly available on HuggingFace.","short_abstract":"We present BaltiVoice, a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan, with no prior publicly available ASR resources. The corpus contains 10,060 validated utterances in native Nastaliq script, derived from Mozilla Common Voice recordings. We fine-tune...","url_abs":"https://arxiv.org/abs/2606.03504","url_pdf":"https://arxiv.org/pdf/2606.03504v1","authors":"[\"Muhammad Ali\"]","published":"2026-06-02T11:23:49Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[]","has_code":false}
