{"ID":3049987,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T14:39:32.180964103Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04957","arxiv_id":"2606.04957","title":"NLLog: Lightweight, Explainable SOC Anomaly Detection via Log-to-Language Rewriting","abstract":"System-generated logs underpin security monitoring, yet their rigid template-based format hinders both automated analysis and human comprehension. We present NLLog (Natural-Language Log), a lightweight pipeline that deterministically rewrites parsed templates into WHO-WHAT-SEVERITY sentences, pools them with term-frequency-inverse-document-frequency weighting, classifies sessions with tree ensembles, and back-projects evidence with TreeSHAP for analyst review. On Hadoop Distributed File System (HDFS) and Blue Gene/L (BGL) corpora, NLLog exceeds two reproduced matched-protocol baselines; across HDFS, BGL, and the AIT Alert Data Set, it sustains low false-positive rates with commodity-hardware latency suitable for security operations center triage. Coverage, sparse-versus-dense, faithfulness, and adversarial ablations show that fallback sufficiency is corpus-dependent, that an enrollment-time coverage check can surface refinement requirements before deployment, and that an auditable deterministic rewrite combined with lightweight dense encoding provides a measurable representation layer for log-anomaly detection and triage.","short_abstract":"System-generated logs underpin security monitoring, yet their rigid template-based format hinders both automated analysis and human comprehension. We present NLLog (Natural-Language Log), a lightweight pipeline that deterministically rewrites parsed templates into WHO-WHAT-SEVERITY sentences, pools them with term-frequ...","url_abs":"https://arxiv.org/abs/2606.04957","url_pdf":"https://arxiv.org/pdf/2606.04957v1","authors":"[\"Samuel Ndichu\",\"Tao Ban\",\"Seiichi Ozawa\",\"Takeshi Takahashi\",\"Daisuke Inoue\"]","published":"2026-06-03T14:45:29Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.IR\",\"cs.LG\"]","methods":"[\"Large Language Model\"]","has_code":false}
