{"ID":2858493,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.08869","arxiv_id":"2510.08869","title":"Measuring the Hidden Cost of Data Valuation through Collective Disclosure","abstract":"Data valuation methods assign marginal utility to each data point that has contributed to the training of a machine learning model. If used directly as a payout mechanism, this creates a hidden cost of valuation, in which contributors with near-zero marginal value would receive nothing, even though their data had to be collected and assessed. To better formalize this cost, we introduce a conceptual and game-theoretic model, the Information Disclosure Game, between a Data Union (sometimes also called a data trust), a member-run agent representing contributors, and a Data Consumer (e.g., a platform). After first aggregating members' data, the DU releases information progressively by adding Laplacian noise under a differentially-private mechanism. Through simulations with strategies guided by data Shapley values and multi-armed bandit exploration, we demonstrate on a Yelp review helpfulness prediction task that data valuation inherently incurs an explicit acquisition cost and that the DU's collective disclosure policy changes how this cost is distributed across members.","short_abstract":"Data valuation methods assign marginal utility to each data point that has contributed to the training of a machine learning model. If used directly as a payout mechanism, this creates a hidden cost of valuation, in which contributors with near-zero marginal value would receive nothing, even though their data had to be...","url_abs":"https://arxiv.org/abs/2510.08869","url_pdf":"https://arxiv.org/pdf/2510.08869v2","authors":"[\"Patrick Mesana\",\"Gilles Caporossi\",\"Sebastien Gambs\"]","published":"2025-10-09T23:59:25Z","proceeding":"cs.GT","tasks":"[\"cs.GT\"]","methods":"[\"LoRA\"]","has_code":false}
