{"ID":2886267,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.03296","arxiv_id":"2508.03296","title":"Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasoning and Hierarchical Labeling","abstract":"Social platforms have revolutionized information sharing, but also accelerated the dissemination of harmful and policy-violating content. To ensure safety and compliance at scale, moderation systems must go beyond efficiency and offer accuracy and interpretability. However, current approaches largely rely on noisy, label-driven learning, lacking alignment with moderation rules and producing opaque decisions that hinder human review. Therefore, we propose Hierarchical Guard (Hi-Guard), a multimodal moderation framework that introduces a new policy-aligned decision paradigm. The term \"Hierarchical\" reflects two key aspects of our system design: (1) a hierarchical moderation pipeline, where a lightweight binary model first filters safe content and a stronger model handles fine-grained risk classification; and (2) a hierarchical taxonomy in the second stage, where the model performs path-based classification over a hierarchical taxonomy ranging from coarse to fine-grained levels. To ensure alignment with evolving moderation policies, Hi-Guard directly incorporates rule definitions into the model prompt. To further enhance structured prediction and reasoning, we introduce a multi-level soft-margin reward and optimize with Group Relative Policy Optimization (GRPO), penalizing semantically adjacent misclassifications and improving explanation quality. Extensive experiments and real-world deployment demonstrate that Hi-Guard achieves superior classification accuracy, generalization, and interpretability, paving the way toward scalable, transparent, and trustworthy content safety systems. Code is available at: https://github.com/lianqi1008/Hi-Guard.","short_abstract":"Social platforms have revolutionized information sharing, but also accelerated the dissemination of harmful and policy-violating content. To ensure safety and compliance at scale, moderation systems must go beyond efficiency and offer accuracy and interpretability. However, current approaches largely rely on noisy, lab...","url_abs":"https://arxiv.org/abs/2508.03296","url_pdf":"https://arxiv.org/pdf/2508.03296v2","authors":"[\"Anqi Li\",\"Wenwei Jin\",\"Jintao Tong\",\"Pengda Qin\",\"Weijia Li\",\"Guo Lu\"]","published":"2025-08-05T10:16:04Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.LG\"]","methods":"[]","has_code":false,"code_links":[{"ID":611291,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2886267,"paper_url":"https://arxiv.org/abs/2508.03296","paper_title":"Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasoning and Hierarchical Labeling","repo_url":"https://github.com/lianqi1008/Hi-Guard","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
