{"ID":2840415,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.13049","arxiv_id":"2511.13049","title":"Generalization Bounds for Semi-supervised Matrix Completion with Distributional Side Information","abstract":"We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and \\textit{share a common subspace}. We assume that a large amount $M$ of \\textit{unlabeled} data drawn from the sampling distribution $P$ is available, together with a small amount $N$ of labeled data drawn from the same distribution and noisy estimates of the corresponding ground truth entries. This setting is inspired by recommender systems scenarios where the unlabeled data corresponds to `implicit feedback' (consisting in interactions such as purchase, click, etc. ) and the labeled data corresponds to the `explicit feedback', consisting of interactions where the user has given an explicit rating to the item. Leveraging powerful results from the theory of low-rank subspace recovery, together with classic generalization bounds for matrix completion models, we show error bounds consisting of a sum of two error terms scaling as $\\widetilde{O}\\left(\\sqrt{\\frac{nd}{M}}\\right)$ and $\\widetilde{O}\\left(\\sqrt{\\frac{dr}{N}}\\right)$ respectively, where $d$ is the rank of $P$ and $r$ is the rank of $M$. In synthetic experiments, we confirm that the true generalization error naturally splits into independent error terms corresponding to the estimations of $P$ and and the ground truth matrix $\\ground$ respectively. In real-life experiments on Douban and MovieLens with most explicit ratings removed, we demonstrate that the method can outperform baselines relying only on the explicit ratings, demonstrating that our assumptions provide a valid toy theoretical setting to study the interaction between explicit and implicit feedbacks in recommender systems.","short_abstract":"We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and \\textit{share a common subspace}. We assume that a large amount $M$ of \\textit{unlabeled} data drawn from the sampling distribution $P$ is available, tog...","url_abs":"https://arxiv.org/abs/2511.13049","url_pdf":"https://arxiv.org/pdf/2511.13049v2","authors":"[\"Antoine Ledent\",\"Mun Chong Soo\",\"Nong Minh Hieu\"]","published":"2025-11-17T06:53:50Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"stat.ML\"]","methods":"[]","has_code":false}
