Quality control in sublinear time: a case study via random graphs

Aug 22, 2025 cs.DS arXiv:2508.16531

Abstract

Many algorithms are designed to work well on average over inputs. When running such an algorithm on an arbitrary input, we must ask: Can we trust the algorithm on this input? We identify a new class of algorithmic problems addressing this, which we call "Quality Control Problems." These problems are specified by a (positive, real-valued) "quality function" $ρ$ and a distribution $D$ such that, with high probability, a sample drawn from $D$ is "high quality," meaning its $ρ$-value is near $1$. The goal is to accept inputs $x \sim D$ and reject potentially adversarially generated inputs $x$ with $ρ(x)$ far from $1$. The objective of quality control is thus weaker than either component problem: testing for "$ρ(x) \approx 1$" or testing if $x \sim D$, and offers the possibility of more efficient algorithms. In this work, we consider the sublinear version of the quality control problem, where $D \in Δ(\{0,1\}^N)$ and the goal is to solve the $(D ,ρ)$-quality problem with $o(N)$ queries and time. As a case study, we consider random graphs, i.e., $D = G_{n,p}$ (and $N = \binom{n}2$), and the $k$-clique count function $ρ_k := C_k(G)/\mathbb{E}_{G' \sim G_{n,p}}[C_k(G')]$, where $C_k(G)$ is the number of $k$-cliques in $G$. Testing if $G \sim G_{n,p}$ with one sample, let alone with sublinear query access to the sample, is of course impossible. Testing if $ρ_k(G)\approx 1$ requires $p^{-Ω(k^2)}$ samples. In contrast, we show that the quality control problem for $G_{n,p}$ (with $n \geq p^{-ck}$ for some constant $c$) with respect to $ρ_k$ can be tested with $p^{-O(k)}$ queries and time, showing quality control is provably superpolynomially more efficient in this setting. More generally, for a motif $H$ of maximum degree $Δ(H)$, the respective quality control problem can be solved with $p^{-O(Δ(H))}$ queries and running time.

Abstract

PDF Viewer