{"ID":2839488,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.15759","arxiv_id":"2511.15759","title":"Securing AI Agents Against Prompt Injection Attacks","abstract":"Retrieval-augmented generation (RAG) systems have become widely used for enhancing large language model capabilities, but they introduce significant security vulnerabilities through prompt injection attacks. We present a comprehensive benchmark for evaluating prompt injection risks in RAG-enabled AI agents and propose a multi-layered defense framework. Our benchmark includes 847 adversarial test cases across five attack categories: direct injection, context manipulation, instruction override, data exfiltration, and cross-context contamination. We evaluate three defense mechanisms: content filtering with embedding-based anomaly detection, hierarchical system prompt guardrails, and multi-stage response verification, across seven state-of-the-art language models. Our combined framework reduces successful attack rates from 73.2% to 8.7% while maintaining 94.3% of baseline task performance. We release our benchmark dataset and defense implementation to support future research in AI agent security.","short_abstract":"Retrieval-augmented generation (RAG) systems have become widely used for enhancing large language model capabilities, but they introduce significant security vulnerabilities through prompt injection attacks. We present a comprehensive benchmark for evaluating prompt injection risks in RAG-enabled AI agents and propose...","url_abs":"https://arxiv.org/abs/2511.15759","url_pdf":"https://arxiv.org/pdf/2511.15759v1","authors":"[\"Badrinath Ramakrishnan\",\"Akshaya Balaji\"]","published":"2025-11-19T10:00:54Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.AI\"]","methods":"[\"RAG\",\"Language Model\"]","has_code":false}
