{"ID":2847959,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.11599","arxiv_id":"2511.11599","title":"SynBullying: A Multi LLM Synthetic Conversational Dataset for Cyberbullying Detection","abstract":"We introduce SynBullying, a synthetic multi-LLM conversational dataset for studying and detecting cyberbullying (CB). SynBullying provides a scalable and ethically safe alternative to human data collection by leveraging large language models (LLMs) to simulate realistic bullying interactions. The dataset offers (i) conversational structure, capturing multi-turn exchanges rather than isolated posts; (ii) context-aware annotations, where harmfulness is assessed within the conversational flow considering context, intent, and discourse dynamics; and (iii) fine-grained labeling, covering various CB categories for detailed linguistic and behavioral analysis. We evaluate SynBullying across five dimensions, including conversational structure, lexical patterns, sentiment/toxicity, role dynamics, harm intensity, and CB-type distribution. We further examine its utility by testing its performance as standalone training data and as an augmentation source for CB classification.","short_abstract":"We introduce SynBullying, a synthetic multi-LLM conversational dataset for studying and detecting cyberbullying (CB). SynBullying provides a scalable and ethically safe alternative to human data collection by leveraging large language models (LLMs) to simulate realistic bullying interactions. The dataset offers (i) con...","url_abs":"https://arxiv.org/abs/2511.11599","url_pdf":"https://arxiv.org/pdf/2511.11599v3","authors":"[\"Arefeh Kazemi\",\"Hamza Qadeer\",\"Joachim Wagner\",\"Hossein Hosseini\",\"Sri Balaaji Natarajan Kalaivendan\",\"Brian Davis\"]","published":"2025-10-30T09:27:36Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\",\"cs.CY\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
