Analyses of 600+ insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges [RESEARCH]

John S. Sproul1,2,3,12, Scott Hotaling4,5,12, Jacqueline Heckenhauer6,7,12, Ashlyn Powell8, Dez Marshall2, Amanda M. Larracuente3, Joanna L. Kelley4,9, Steffen U. Pauls6,7,10 and Paul B. Frandsen6,8,11 1Department of Biology, Brigham Young University, Provo, Utah 84602, USA; 2Department of Biology, University of Nebraska Omaha, Omaha, Nebraska 68182, USA; 3Department of Biology, University of Rochester, Rochester, New York 14627, USA; 4School of Biological Sciences, Washington State University, Pullman, Washington 99163, USA; 5Department of Watershed Sciences, Utah State University, Logan, Utah 84322, USA; 6LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany; 7Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt, Germany; 8Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah 84602, USA; 9Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, California 95064, USA; 10Department of Insect Biotechnology, Justus-Liebig-University Gießen, 35392 Gießen, Germany; 11Data Science Lab, Smithsonian Institution, Washington, District of Columbia 20560, USA

12 These authors contributed equally to this work.

Corresponding author: johnssproulgmail.com Abstract

Repetitive elements (REs) are integral to the composition, structure, and function of eukaryotic genomes, yet remain understudied in most taxonomic groups. We investigated REs across 601 insect species and report wide variation in RE dynamics across groups. Analysis of associations between REs and protein-coding genes revealed dynamic evolution at the interface between REs and coding regions across insects, including notably elevated RE–gene associations in lineages with abundant long interspersed nuclear elements (LINEs). We leveraged this large, empirical data set to quantify impacts of long-read technology on RE detection and investigate fundamental challenges to RE annotation in diverse groups. In long-read assemblies, we detected ∼36% more REs than short-read assemblies, with long terminal repeats (LTRs) showing 162% increased detection, whereas DNA transposons and LINEs showed less respective technology-related bias. In most insect lineages, 25%–85% of repetitive sequences were “unclassified” following automated annotation, compared with only ∼13% in Drosophila species. Although the diversity of available insect genomes has rapidly expanded, we show the rate of community contributions to RE databases has not kept pace, preventing efficient annotation and high-resolution study of REs in most groups. We highlight the tremendous opportunity and need for the biodiversity genomics field to embrace REs and suggest collective steps for making progress toward this goal.

Footnotes

[Supplemental material is available for this article.]

Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.277387.122.

Freely available online through the Genome Research Open Access option.

Received October 6, 2022. Accepted September 20, 2023.

留言 (0)

沒有登入
gif