DOI: 10.1093/bioinformatics/btaf146 ISSN: 1367-4811

OLTA: Optimizing bait seLection for TArgeted sequencing

Mete Orhun Minbay, Richard Sun, Vijay Ramachandran, Ahmet Ay, Tamer Kahveci

Abstract

Motivation

Targeted enrichment via capture probes, also known as baits, is a promising complementary procedure for next-generation sequencing methods. This technique employs short biotinylated oligonucleotide probes that hybridize with complementary genetic material in a sample. Following hybridization, the target fragments can be easily isolated and processed with the minimal contamination from irrelevant material. Designing an efficient set of baits for a set of target sequences, however, is an NP-hard problem.

Results

We develop a novel heuristic algorithm that leverages the similarities between the characteristics of the Minimum Bait Cover and the Closest String problems to reduce the number of baits to cover a given target sequence. Our results on real and synthetic datasets demonstrate that our algorithm, OLTA produces fewest baits for nearly all experimental settings and datasets. On average, it produces 6 and 11% fewer baits than the next best state of the art methods for two major real datasets, AIV and MEGARES. Also, its bait set has the highest utilization and the minimum redundancy.

Availability

Our algorithm is available at github.com/FuelTheBurn/OLTA-Optimizing-bait-seLection-for-TArgeted-sequencing. Test data and other software are archived at doi.org/10.5281/zenodo.15086636.

Supplementary information

Supplementary data are available at Bioinformatics online.

More from our Archive