{"ID":2836565,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.21686","arxiv_id":"2511.21686","title":"Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework","abstract":"Synthetic data has become increasingly important for training large language models, especially when real data is scarce, expensive, or privacy-sensitive. Many such generation tasks require coordinated multi-agent workflows, where specialized agents collaborate to produce data that is higher quality, more diverse, and structurally richer. However, existing frameworks for multi-agent synthesis often depend on a centralized orchestrator, creating scalability bottlenecks, or are hardcoded for specific domains, limiting flexibility. We present \\textbf{Matrix}, a decentralized framework that represents both control and data flow as serialized messages passed through distributed queues. This peer-to-peer design eliminates the central orchestrator. Each task progresses independently through lightweight agents, while compute-intensive operations, such as LLM inference or containerized environments, are handled by distributed services. Built on Ray, Matrix scales to tens of thousands of concurrent agentic workflows and provides a modular, configurable design that enables easy adaptation to a wide range of data generation workflows. We evaluate Matrix across diverse synthesis scenarios, such as multi-agent collaborative dialogue, web-based reasoning data extraction, and tool-use trajectory generation in customer service environments. In all cases, Matrix achieves $2$--$15\\times$ higher data generation throughput under identical hardware resources, without compromising output quality.","short_abstract":"Synthetic data has become increasingly important for training large language models, especially when real data is scarce, expensive, or privacy-sensitive. Many such generation tasks require coordinated multi-agent workflows, where specialized agents collaborate to produce data that is higher quality, more diverse, and...","url_abs":"https://arxiv.org/abs/2511.21686","url_pdf":"https://arxiv.org/pdf/2511.21686v2","authors":"[\"Dong Wang\",\"Yang Li\",\"Ansong Ni\",\"Ching-Feng Yeh\",\"Youssef Emad\",\"Xinjie Lei\",\"Liam Robbins\",\"Karthik Padthe\",\"Hu Xu\",\"Xian Li\",\"Asli Celikyilmaz\",\"Ramya Raghavendra\",\"Lifei Huang\",\"Carole-Jean Wu\",\"Shang-Wen Li\"]","published":"2025-11-26T18:59:28Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}