{"ID":2857458,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.09184","arxiv_id":"2510.09184","title":"Stronger Re-identification Attacks through Reasoning and Aggregation","abstract":"Text de-identification techniques are often used to mask personally identifiable information (PII) from documents. Their ability to conceal the identity of the individuals mentioned in a text is, however, hard to measure. Recent work has shown how the robustness of de-identification methods could be assessed by attempting the reverse process of _re-identification_, based on an automated adversary using its background knowledge to uncover the PIIs that have been masked. This paper presents two complementary strategies to build stronger re-identification attacks. We first show that (1) the _order_ in which the PII spans are re-identified matters, and that aggregating predictions across multiple orderings leads to improved results. We also find that (2) reasoning models can boost the re-identification performance, especially when the adversary is assumed to have access to extensive background knowledge.","short_abstract":"Text de-identification techniques are often used to mask personally identifiable information (PII) from documents. Their ability to conceal the identity of the individuals mentioned in a text is, however, hard to measure. Recent work has shown how the robustness of de-identification methods could be assessed by attempt...","url_abs":"https://arxiv.org/abs/2510.09184","url_pdf":"https://arxiv.org/pdf/2510.09184v1","authors":"[\"Lucas Georges Gabriel Charpentier\",\"Pierre Lison\"]","published":"2025-10-10T09:27:42Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
