Distinct sequence features underlie microdeletions and gross deletions in the human genome

Microdeletions and gross deletions are important causes (~20%) of human inherited disease and their genomic locations are strongly influenced by the local DNA sequence environment. This notwithstanding, no study has systematically examined their underlying generative mechanisms. Here, we obtained 42,098 pathogenic microdeletions and gross deletions from the Human Gene Mutation Database (HGMD) that together form a continuum of germline deletions ranging in size from 1bp to 28,394,429bp. We analyzed the DNA sequence within 1-kb of the breakpoint junctions and found that the frequencies of non-B DNA-forming repeats, GC-content, and the presence of seven of 78 specific sequence motifs in the vicinity of pathogenic deletions correlated with deletion length for deletions of length ≤30 bp. Further, we found that the presence of DR, GQ and STR repeats is important for the formation of longer deletions (>30 bp) but not for the formation of shorter deletions (≤30 bp) whilst significantly (Chi-square test P-value<2E-16) more microhomologies were identified flanking short deletions than long deletions (length >30 bp). We provide evidence to support a functional distinction between microdeletions and gross deletions. Finally, we propose that a deletion length cut-off of 25-30bp may serve as an objective means to functionally distinguish microdeletions from gross deletions.

This article is protected by copyright. All rights reserved.

Comments (0)

No login
gif