{"ID":2847156,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.00361","arxiv_id":"2511.00361","title":"MalDataGen: A Modular Framework for Synthetic Tabular Data Generation in Malware Detection","abstract":"High-quality data scarcity hinders malware detection, limiting ML performance. We introduce MalDataGen, an open-source modular framework for generating high-fidelity synthetic tabular data using modular deep learning models (e.g., WGAN-GP, VQ-VAE). Evaluated via dual validation (TR-TS/TS-TR), seven classifiers, and utility metrics, MalDataGen outperforms benchmarks like SDV while preserving data utility. Its flexible design enables seamless integration into detection pipelines, offering a practical solution for cybersecurity applications.","short_abstract":"High-quality data scarcity hinders malware detection, limiting ML performance. We introduce MalDataGen, an open-source modular framework for generating high-fidelity synthetic tabular data using modular deep learning models (e.g., WGAN-GP, VQ-VAE). Evaluated via dual validation (TR-TS/TS-TR), seven classifiers, and uti...","url_abs":"https://arxiv.org/abs/2511.00361","url_pdf":"https://arxiv.org/pdf/2511.00361v1","authors":"[\"Kayua Oleques Paim\",\"Angelo Gaspar Diniz Nogueira\",\"Diego Kreutz\",\"Weverton Cordeiro\",\"Rodrigo Brandao Mansilha\"]","published":"2025-11-01T02:08:58Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Generative Adversarial Network\",\"Variational Autoencoder\"]","has_code":false}
