{"ID":2868035,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.16898","arxiv_id":"2509.16898","title":"The Complexity of Finding Local Optima in Contrastive Learning","abstract":"Contrastive learning is a powerful technique for discovering meaningful data representations by optimizing objectives based on $\\textit{contrastive information}$, often given as a set of weighted triplets $\\{(x_i, y_i^+, z_{i}^-)\\}_{i = 1}^m$ indicating that an \"anchor\" $x_i$ is more similar to a \"positive\" example $y_i$ than to a \"negative\" example $z_i$. The goal is to find representations (e.g., embeddings in $\\mathbb{R}^d$ or a tree metric) where anchors are placed closer to positive than to negative examples. While finding $\\textit{global}$ optima of contrastive objectives is $\\mathsf{NP}$-hard, the complexity of finding $\\textit{local}$ optima -- representations that do not improve by local search algorithms such as gradient-based methods -- remains open. Our work settles the complexity of finding local optima in various contrastive learning problems by proving $\\mathsf{PLS}$-hardness in discrete settings (e.g., maximize satisfied triplets) and $\\mathsf{CLS}$-hardness in continuous settings (e.g., minimize Triplet Loss), where $\\mathsf{PLS}$ (Polynomial Local Search) and $\\mathsf{CLS}$ (Continuous Local Search) are well-studied complexity classes capturing local search dynamics in discrete and continuous optimization, respectively. Our results imply that no polynomial time algorithm (local search or otherwise) can find a local optimum for various contrastive learning problems, unless $\\mathsf{PLS}\\subseteq\\mathsf{P}$ (or $\\mathsf{CLS}\\subseteq \\mathsf{P}$ for continuous problems). Even in the unlikely scenario that $\\mathsf{PLS}\\subseteq\\mathsf{P}$ (or $\\mathsf{CLS}\\subseteq \\mathsf{P}$), our reductions imply that there exist instances where local search algorithms need exponential time to reach a local optimum, even for $d=1$ (embeddings on a line).","short_abstract":"Contrastive learning is a powerful technique for discovering meaningful data representations by optimizing objectives based on $\\textit{contrastive information}$, often given as a set of weighted triplets $\\{(x_i, y_i^+, z_{i}^-)\\}_{i = 1}^m$ indicating that an \"anchor\" $x_i$ is more similar to a \"positive\" example $y_...","url_abs":"https://arxiv.org/abs/2509.16898","url_pdf":"https://arxiv.org/pdf/2509.16898v1","authors":"[\"Jingming Yan\",\"Yiyuan Luo\",\"Vaggos Chatziafratis\",\"Ioannis Panageas\",\"Parnian Shahkar\",\"Stelios Stavroulakis\"]","published":"2025-09-21T03:21:04Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CC\",\"math.OC\"]","methods":"[]","has_code":false}