{"ID":2832103,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.11550","arxiv_id":"2601.11550","title":"Uniqueness ratio as a predictor of a privacy leakage","abstract":"Identity leakage can emerge when independent databases are joined, even when each dataset is anonymized individually. While previous work focuses on post-join detection or complex privacy models, little attention has been given to simple, interpretable pre-join indicators that can warn data engineers and database administrators before integration occurs. This study investigates the uniqueness ratio of candidate join attributes as an early predictor of re-identification risk. Using synthetic multi-table datasets, we compute the uniqueness ratio of attribute combinations within each database and examine how these ratios correlate with identity exposure after the join. Experimental results show a strong relationship between high pre-join uniqueness and increased post-join leakage, measured by the proportion of records that become uniquely identifiable or fall into very small groups. Our findings demonstrate that uniqueness ratio offers an explainable and practical signal for assessing join induced privacy risk, providing a foundation for developing more comprehensive pre-join risk estimation models.","short_abstract":"Identity leakage can emerge when independent databases are joined, even when each dataset is anonymized individually. While previous work focuses on post-join detection or complex privacy models, little attention has been given to simple, interpretable pre-join indicators that can warn data engineers and database admin...","url_abs":"https://arxiv.org/abs/2601.11550","url_pdf":"https://arxiv.org/pdf/2601.11550v1","authors":"[\"Danah A. AlSalem AlKhashti\"]","published":"2025-12-07T20:04:26Z","proceeding":"cs.DB","tasks":"[\"cs.DB\",\"cs.CR\",\"cs.LG\"]","methods":"[]","has_code":false}
