{"ID":2835276,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.00638","arxiv_id":"2512.00638","title":"Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation","abstract":"We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, reducing encoding overhead and scaling to high-dimensional datasets. To adapt DP-training to the diffusion process, we propose two privacy-aware training strategies: an adaptive timestep sampler that aligns updates with diffusion dynamics, and a feature-aggregated loss that mitigates clipping-induced bias. Together, these enhancements improve fidelity and downstream utility without weakening privacy guarantees. On financial and medical datasets, DP-FinDiff achieves 16-42% higher utility than DP baselines at comparable privacy levels, demonstrating its promise for safe and effective data sharing in sensitive domains.","short_abstract":"We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, reducing encoding overhead and scaling to high-dimensional datasets. To adapt DP-training to the diffusion process, we propose two...","url_abs":"https://arxiv.org/abs/2512.00638","url_pdf":"https://arxiv.org/pdf/2512.00638v1","authors":"[\"Timur Sattarov\",\"Marco Schreyer\",\"Damian Borth\"]","published":"2025-11-29T21:23:57Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Diffusion Model\"]","has_code":false}
