{"ID":2859269,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.05767","arxiv_id":"2510.05767","title":"Diversity Is All You Need for Contrastive Learning: Spectral Bounds on Gradient Magnitudes","abstract":"We derive non-asymptotic spectral bands that bound the squared InfoNCE gradient norm via alignment, temperature, and batch spectrum, recovering the \\(1/τ^{2}\\) law and closely tracking batch-mean gradients on synthetic data and ImageNet. Using effective rank \\(R_{\\mathrm{eff}}\\) as an anisotropy proxy, we design spectrum-aware batch selection, including a fast greedy builder. On ImageNet-100, Greedy-64 cuts time-to-67.5\\% top-1 by 15\\% vs.\\ random (24\\% vs.\\ Pool--P3) at equal accuracy; CIFAR-10 shows similar gains. In-batch whitening promotes isotropy and reduces 50-step gradient variance by \\(1.37\\times\\), matching our theoretical upper bound.","short_abstract":"We derive non-asymptotic spectral bands that bound the squared InfoNCE gradient norm via alignment, temperature, and batch spectrum, recovering the \\(1/τ^{2}\\) law and closely tracking batch-mean gradients on synthetic data and ImageNet. Using effective rank \\(R_{\\mathrm{eff}}\\) as an anisotropy proxy, we design spectr...","url_abs":"https://arxiv.org/abs/2510.05767","url_pdf":"https://arxiv.org/pdf/2510.05767v1","authors":"[\"Peter Ochieng\"]","published":"2025-10-07T10:35:58Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
