{"ID":2921004,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-04T07:41:34.29888543Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01990","arxiv_id":"2606.01990","title":"Testing for Single-Population Ancestry in the Admixture Model","abstract":"The Admixture Model describes genetic marker data by representing each individual's genome as a mixture of contributions from $K$ ancestral populations, with the individual admixture vector summarizing the corresponding ancestry proportions. In population and forensic genetics, a key question is whether an individual's genome supports a predominantly single-ancestry interpretation or whether an admixed interpretation is more appropriate. We propose a statistical test for single-population ancestry in the supervised Admixture Model, where ancestral allele frequencies are treated as known. The test assesses whether the largest admixture component exceeds a practitioner-chosen dominance threshold, giving precise meaning to the notion of a sufficiently strong single-population contribution. To calibrate the test, we develop a constrained parametric bootstrap procedure that generates data under a null-constrained maximum likelihood estimator, accounting for the constrained hypothesis structure, the marker-wise heterogeneity and small sample sizes. Under standard regularity conditions, we prove that the proposed test has asymptotic level $α$ and is consistent, ensuring control of false single-ancestry declarations while reliably detecting dominant ancestry components. Simulation studies demonstrate good finite-sample performance across different numbers of ancestral populations, marker-panel sizes, dominance thresholds, and allele-frequency distributions. We further illustrate the practical utility of the method using data from the 1000 Genomes Project. The proposed framework delivers interpretable, threshold-based ancestry assessment with rigorous error control, and extends constrained bootstrap methodology to the independent but non-identically distributed setting of genetic marker data.","short_abstract":"The Admixture Model describes genetic marker data by representing each individual's genome as a mixture of contributions from $K$ ancestral populations, with the individual admixture vector summarizing the corresponding ancestry proportions. In population and forensic genetics, a key question is whether an individual's...","url_abs":"https://arxiv.org/abs/2606.01990","url_pdf":"https://arxiv.org/pdf/2606.01990v1","authors":"[\"Holger Dette\",\"Carola Sophia Heinzel\",\"Zoe Lange\",\"Peter Pfaffelhuber\"]","published":"2026-06-01T09:47:59Z","proceeding":"stat.ME","tasks":"[\"stat.ME\",\"math.ST\"]","methods":"[]","has_code":false}
