{"ID":2832579,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.05747","arxiv_id":"2512.05747","title":"Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning","abstract":"Evaluating and optimising authorial style in long-form story generation remains challenging because style is often assessed with ad hoc prompting and is frequently conflated with overall writing quality. We propose a two-stage pipeline. First, we train a dedicated style-similarity judge by fine-tuning a sentence-transformer with authorship-verification supervision, and calibrate its similarity outputs into a bounded $[0,1]$ reward. Second, we use this judge as the primary reward in Group Relative Policy Optimization (GRPO) to fine-tune an 8B story generator for style-conditioned writing, avoiding the accept/reject supervision required by Direct Preference Optimization (DPO). Across four target authors (Mark Twain, Jane Austen, Charles Dickens, Thomas Hardy), the GRPO-trained 8B model achieves higher style scores than open-weight baselines, with an average style score of 0.893 across authors. These results suggest that AV-calibrated reward modelling provides a practical mechanism for controllable style transfer in long-form generation under a moderate model size and training budget.","short_abstract":"Evaluating and optimising authorial style in long-form story generation remains challenging because style is often assessed with ad hoc prompting and is frequently conflated with overall writing quality. We propose a two-stage pipeline. First, we train a dedicated style-similarity judge by fine-tuning a sentence-transf...","url_abs":"https://arxiv.org/abs/2512.05747","url_pdf":"https://arxiv.org/pdf/2512.05747v3","authors":"[\"Jinlong Liu\",\"Mohammed Bahja\",\"Venelin Kovatchev\",\"Mark Lee\"]","published":"2025-12-05T14:29:27Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Transformer\"]","has_code":false}