{"ID":3084834,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T03:54:17.966829144Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05679","arxiv_id":"2606.05679","title":"Data Flow Control: Data Safety Policies for AI Agents","abstract":"Agents increasingly generate SQL, orchestrate pipelines, and automate data analysis on behalf of users. While recent work improves query correctness, correctness is not safety. A query may be semantically valid yet violate regulatory, privacy, or business constraints that govern how data may be combined and released. We argue that enforcing such constraints is fundamentally a data infrastructure problem. This paper introduces Data Flow Control (DFC), a framework to declaratively specify and guarantee policy enforcement over tuple-level data flows within a DBMS query. A key challenge is defining a policy language that is optimizer-invariant yet efficient to enforce at scale. We formalize data safety as aggregate predicates over provenance monomials and present Passant, a portable query rewriting layer that enforces DFC policies without materializing provenance. Across five DBMS engines -- DuckDB, Umbra, PostgreSQL, DataFusion, and SQLServer -- Passant achieves ~0% overhead and outperforms alternatives by orders of magnitude. As a result, Data Flow Control is the first step towards moving data safety from prompts and post-hoc checks into the data infrastructure. Data Flow Control is available open source at https://github.com/dataflowcontrol/data-flow-control.","short_abstract":"Agents increasingly generate SQL, orchestrate pipelines, and automate data analysis on behalf of users. While recent work improves query correctness, correctness is not safety. A query may be semantically valid yet violate regulatory, privacy, or business constraints that govern how data may be combined and released. W...","url_abs":"https://arxiv.org/abs/2606.05679","url_pdf":"https://arxiv.org/pdf/2606.05679v1","authors":"[\"Charlie Summers\",\"Eugene Wu\"]","published":"2026-06-04T04:01:24Z","proceeding":"cs.DB","tasks":"[\"cs.DB\",\"cs.AI\"]","methods":"[]","has_code":false,"code_links":[{"ID":612860,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-05T06:46:15.197025399Z","DeletedAt":null,"paper_id":3084834,"paper_url":"https://arxiv.org/abs/2606.05679","paper_title":"Data Flow Control: Data Safety Policies for AI Agents","repo_url":"https://github.com/dataflowcontrol/data-flow-control","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
