{"ID":2849616,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.23679","arxiv_id":"2510.23679","title":"PanDelos-plus: A parallel algorithm for computing sequence homology in pangenomic analysis","abstract":"The identification of homologous gene families across multiple genomes is a central task in bacterial pangenomics traditionally requiring computationally demanding all-against-all comparisons. PanDelos addresses this challenge with an alignment-free and parameter-free approach based on k-mer profiles, combining high speed, ease of use, and competitive accuracy with state-of-the-art methods. However, the increasing availability of genomic data requires tools that can scale efficiently to larger datasets. To address this need, we present PanDelos-plus, a fully parallel, gene-centric redesign of PanDelos. The algorithm parallelizes the most computationally intensive phases (Best Hit detection and Bidirectional Best Hit extraction) through data decomposition and a thread pool strategy, while employing lightweight data structures to reduce memory usage. Benchmarks on synthetic datasets show that PanDelos-plus achieves up to 14x faster execution and reduces memory usage by up to 96%, while maintaining accuracy. These improvements enable population-scale comparative genomics to be performed on standard multicore workstations, making large-scale bacterial pangenome analysis accessible for routine use in everyday research.","short_abstract":"The identification of homologous gene families across multiple genomes is a central task in bacterial pangenomics traditionally requiring computationally demanding all-against-all comparisons. PanDelos addresses this challenge with an alignment-free and parameter-free approach based on k-mer profiles, combining high sp...","url_abs":"https://arxiv.org/abs/2510.23679","url_pdf":"https://arxiv.org/pdf/2510.23679v1","authors":"[\"Simone Colli\",\"Emiliano Maresi\",\"Vincenzo Bonnici\"]","published":"2025-10-27T13:10:14Z","proceeding":"q-bio.GN","tasks":"[\"q-bio.GN\",\"cs.DC\"]","methods":"[]","has_code":false}
