{"ID":3050162,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T08:42:33.101913816Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04612","arxiv_id":"2606.04612","title":"Hybrid Adversarial Defence for Natural Language Understanding Tasks","abstract":"Large Language Models (LLMs) are vulnerable both to hallucination and adversarial manipulation. Although these problems are closely related, existing defences typically address them separately. We investigate a hybrid defence framework that combines entropy-based models, designed to reduce hallucinations, with uncertainty-based models and geometric-based models, designed to reduce vulnerability. Under in-domain tests on Natural Language Understanding datasets (FEVER, HotpotQA, CSQA, SIQA) we find our hybrid model improves both clean-task performance (up to 43.34\\% increase in accuracy) and adversarial robustness (up to 64.92\\% improvement in accuracy and 62.27\\% reduction in attack success rate). For out-of-distribution datasets (AeroEngQA, CPIQA) we see similar adversarial robustness from our hybrid model (up to 57.14\\% improvement in accuracy). For prompt injection (SafeGuard) and jailbreak detection (AdvBench, DAN) datasets our hybrid model is also very strong (up to 51\\% reduction in attack success rate compared to state of the art baseline models). Overall, our results show that combining entropy, uncertainty and geometric features provides a more effective defence strategy than using any single feature alone for both in-domain and out-of-distribution tasks.","short_abstract":"Large Language Models (LLMs) are vulnerable both to hallucination and adversarial manipulation. Although these problems are closely related, existing defences typically address them separately. We investigate a hybrid defence framework that combines entropy-based models, designed to reduce hallucinations, with uncertai...","url_abs":"https://arxiv.org/abs/2606.04612","url_pdf":"https://arxiv.org/pdf/2606.04612v1","authors":"[\"Manar Abouzaid\",\"Yang Wang\",\"Chenghua Lin\",\"Stuart E. Middleton\"]","published":"2026-06-03T08:49:15Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
