Stability of In-Context Learning: A Spectral Coverage Perspective
Abstract
In-context learning (ICL) is a pivotal capability for the practical deployment of large-scale language models, yet its reliability can vary substantially with the number of demonstrations provided in the prompt. A central obstacle is that the target notion, \emph{distributional stability under demonstration resampling}, is expensive to measure directly at scale, making prompt-length selection largely heuristic. We therefore study a \emph{computable sufficient condition} based on a spectral-coverage proxy: the lower tail of the spectrum of a regularized empirical second-moment matrix formed from demonstration representations. Under sub-Gaussian representation assumptions, we derive a non-asymptotic sample-size requirement (a lower bound on $K$) that guarantees this proxy event with prescribed failure probability, yielding a conservative prompt-length recommendation produced by an observable two-stage estimator. In large-scale experiments, the resulting estimates consistently upper-bound empirical accuracy knee-points, which we treat only as a practical surrogate for the prompt-length transition rather than a definition of stability. On a smaller held-out subset, direct resampling-based distributional stability measurements further validate the intended stability interpretation. Finally, a validation-only calibration step tightens the conservatism (typically to about $1.03$--$1.20\times$) while preserving conservative ordering, providing practical and verifiable guidance for ICL prompt design.