{"ID":2874722,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.04403","arxiv_id":"2509.04403","title":"Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios","abstract":"Multimodal large language models (MLLMs) are rapidly evolving, presenting increasingly complex safety challenges. However, current dataset construction methods, which are risk-oriented, fail to cover the growing complexity of real-world multimodal safety scenarios (RMS). And due to the lack of a unified evaluation metric, their overall effectiveness remains unproven. This paper introduces a novel image-oriented self-adaptive dataset construction method for RMS, which starts with images and end constructing paired text and guidance responses. Using the image-oriented method, we automatically generate an RMS dataset comprising 35k image-text pairs with guidance responses. Additionally, we introduce a standardized safety dataset evaluation metric: fine-tuning a safety judge model and evaluating its capabilities on other safety datasets.Extensive experiments on various tasks demonstrate the effectiveness of the proposed image-oriented pipeline. The results confirm the scalability and effectiveness of the image-oriented approach, offering a new perspective for the construction of real-world multimodal safety datasets. The dataset is presented at https://huggingface.co/datasets/NewCityLetter/RMS2/tree/main.","short_abstract":"Multimodal large language models (MLLMs) are rapidly evolving, presenting increasingly complex safety challenges. However, current dataset construction methods, which are risk-oriented, fail to cover the growing complexity of real-world multimodal safety scenarios (RMS). And due to the lack of a unified evaluation metr...","url_abs":"https://arxiv.org/abs/2509.04403","url_pdf":"https://arxiv.org/pdf/2509.04403v2","authors":"[\"Jingen Qu\",\"Lijun Li\",\"Bo Zhang\",\"Yichen Yan\",\"Jing Shao\"]","published":"2025-09-04T17:13:59Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.CL\",\"cs.CR\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
