{"ID":2872803,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.09030","arxiv_id":"2509.09030","title":"Contextual Learning for Anomaly Detection in Tabular Data","abstract":"Anomaly detection is critical in domains such as cybersecurity and finance, especially when working with large-scale tabular data. Yet, unsupervised anomaly detection-where no labeled anomalies are available-remains challenging because traditional deep learning methods model a single global distribution, assuming all samples follow the same behavior. In contrast, real-world data often contain heterogeneous contexts (e.g., different users, accounts, or devices), where globally rare events may be normal within specific conditions. We introduce a contextual learning framework that explicitly models how normal behavior varies across contexts by learning conditional data distributions $P(\\mathbf{Y} \\mid \\mathbf{C})$ rather than a global joint distribution $P(\\mathbf{X})$. The framework encompasses (1) a probabilistic formulation for context-conditioned learning, (2) a principled bilevel optimization strategy for automatically selecting informative context features using early validation loss, and (3) theoretical grounding through variance decomposition and discriminative learning principles. We instantiate this framework using a novel conditional Wasserstein autoencoder as a simple yet effective model for tabular anomaly detection. Extensive experiments across eight benchmark datasets demonstrate that contextual learning consistently outperforms global approaches-even when the optimal context is not intuitively obvious-establishing a new foundation for anomaly detection in heterogeneous tabular data.","short_abstract":"Anomaly detection is critical in domains such as cybersecurity and finance, especially when working with large-scale tabular data. Yet, unsupervised anomaly detection-where no labeled anomalies are available-remains challenging because traditional deep learning methods model a single global distribution, assuming all s...","url_abs":"https://arxiv.org/abs/2509.09030","url_pdf":"https://arxiv.org/pdf/2509.09030v2","authors":"[\"Spencer King\",\"Zhilu Zhang\",\"Ruofan Yu\",\"Baris Coskun\",\"Wei Ding\",\"Qian Cui\"]","published":"2025-09-10T22:01:11Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}