{"ID":3084864,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T05:32:54.120957816Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05725","arxiv_id":"2606.05725","title":"An Embarrassingly Simple Detector for Model Extraction Attacks in Large Language Model API Traffic","abstract":"Large language models (LLMs) are increasingly deployed through hosted APIs, making model extraction a practical threat to model ownership and service security. However, individual extraction queries often resemble benign requests, and existing evaluations often focus on single-query anomaly scoring or pure benign-versus-attacker user settings. We formulate model extraction monitoring as benign-calibrated traffic-window distribution testing and show that an embarrassingly simple detector is effective: embed incoming queries into a semantic space and test whether their aggregate distribution deviates from historical benign traffic. We instantiate the detector with maximum mean discrepancy (MMD), using only benign-vs-benign comparisons to set the decision threshold. We evaluate on fourteen attacker-normal query pairs from four extraction scenarios and compare with adapted PRADA, SEAT, CAP, DATE, and marginal Mahalanobis baselines. Across three random seeds, MMD achieves 0.3% benign FPR, 100.0% pure-attacker TPR, 90.5% average TPR over attacker fractions, and 95.1% balanced accuracy. These results show that benign-calibrated distribution testing is a strong empirical baseline for model extraction detection in both user-level and mixed multi-user LLM API traffic. Code is released at: https://github.com/LabRAI/mmd-llm-mea-detection.","short_abstract":"Large language models (LLMs) are increasingly deployed through hosted APIs, making model extraction a practical threat to model ownership and service security. However, individual extraction queries often resemble benign requests, and existing evaluations often focus on single-query anomaly scoring or pure benign-versu...","url_abs":"https://arxiv.org/abs/2606.05725","url_pdf":"https://arxiv.org/pdf/2606.05725v1","authors":"[\"Shuze Liu\",\"Qianwen Guo\",\"Yushun Dong\"]","published":"2026-06-04T05:33:49Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":612865,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-05T06:46:15.197025399Z","DeletedAt":null,"paper_id":3084864,"paper_url":"https://arxiv.org/abs/2606.05725","paper_title":"An Embarrassingly Simple Detector for Model Extraction Attacks in Large Language Model API Traffic","repo_url":"https://github.com/LabRAI/mmd-llm-mea-detection","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
