{"ID":2850297,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.23650","arxiv_id":"2510.23650","title":"Beyond Hidden-Layer Manipulation: Semantically-Aware Logit Interventions for Debiasing LLMs","abstract":"We proposed Static and Dynamic -- two zero-shot logits-layer debiasing methods. Dynamic reduces bias by up to 70% with minimal fluency loss. Logits intervention outperforms hidden-layer approaches. We show semantic-aware logits intervention is stable and effective for debiasing aligned LLMs.","short_abstract":"We proposed Static and Dynamic -- two zero-shot logits-layer debiasing methods. Dynamic reduces bias by up to 70% with minimal fluency loss. Logits intervention outperforms hidden-layer approaches. We show semantic-aware logits intervention is stable and effective for debiasing aligned LLMs.","url_abs":"https://arxiv.org/abs/2510.23650","url_pdf":"https://arxiv.org/pdf/2510.23650v1","authors":"[\"Wei Xia\"]","published":"2025-10-25T12:45:00Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}
