Abstract
BACKGROUND Effective management of acutely ill newborns with genetic conditions requires rapid and comprehensive identification of causative haplotypes. It has been previously shown that whole genome sequencing (WGS) can identify small variants contributing to the genetic illness of such patients in less than 50 hours. Deletion structural variants (SVs) 50 nucleotides are implicated in many genetic diseases and with WGS data can now be identified with a performance and timeframe sufficient for diagnosis in neonatal intensive care units. Here we describe the development of a solution that combines consensus calls from two SV detection tools (Breakdancer [BD] and GenomeStrip [GS]) with a novel filtering strategy. RESULTS WGS simulation data demonstrated BD and GS consensus calls had 83% sensitivity and 99% positive predictive value with high precision. Through raw data inspection in the integrated genome viewer (IGV) consensus calls overlapping with SNP arrays were found to be 95% true positive and were subsequently used for filter parameterization. Consensus calling and filtering were implemented as a computational pipeline. IGV evaluation of pipeline results in a tetrad demonstrated calls were over 80% true positive but insensitive. Pipeline usage in 10 proband family sets revealed a possibly causative deletion SV in the MMP21 gene for two siblings. MMP21 is thought to play a role in embryogenesis in humans and may be responsible for the heterotaxy phenotype in humans. Further studies are needed to confirm these results. CONCLUSIONS The identification of deletion SVs has the potential to increase the diagnostic yield of WGS data. The methods described in this study may be useful in the research of disease detection in acutely ill neonates.