2019 ASHS Annual Conference
A Bioinformatic Platform for Identifying Target DNA Sequences for the Development of Sub-Genome Specific DNA Markers in Polyploid and Other Complex Genomes
A Bioinformatic Platform for Identifying Target DNA Sequences for the Development of Sub-Genome Specific DNA Markers in Polyploid and Other Complex Genomes
Tuesday, July 23, 2019
Cohiba 5-11 (Tropicana Las Vegas)
The genomes of numerous eukaryotic species are polyploid. These include several economically important food, feed, and fiber plants, such as strawberry (Fragaria × ananassa), wheat (Triticum aestivuum), and cotton (Gossypium hirsutum). The presence of multiple copies of genes or genomes in a single cell poses technical challenges for the application of high-throughput and next-generation genotyping technologies in breeding and forward genetics research. While sequence duplication often complicates genomic analyses in diploid organisms, this issue is ubiquitous in polyploid research. We studied the problem of identifying unique DNA sequences which are required for developing copy specific high-throughput genotyping assays in organisms where multiple copies of a gene are present. The computational solutions and analyses presented here were motivated by the need to identify subgenome specific DNA sequences in garden strawberry (Fragaria × ananassa), an octoploid originating ~300 years ago from hybrids between octoploid progenitors that originated 0.5-1.0 million years before present. More specifically, our task was to identify suitable DNA sequences for the development of co-dominant PCR-based genotyping assays that are copy specific and avoid amplification of off-target sequences along the genome. Here, we describe the computational solutions and software that emerged from the strawberry study. The software we developed predicts the specificity of DNA sequences for genotyping assay development and considers the presence of variants and indels in a reference population that may adversely affect primer binding sites and amplification products leading to spurious genotyping. In addition, as a guide for designing assays, we formulated a heuristic score which captures the expected quality of the genotyping within a considered population. This software can be applied in any species with a reference genome and benefits from the use of mutation information from high-density SNP array data, whole-genome shotgun DNA sequence data, or a combination thereof. Finally, code optimizations combined with process parallelization makes this software amenable to the large-scale design of PCR-based genotyping assays across an entire genome.