CGC1, a new reference genome for Caenorhabditis elegans [RESOURCES]

Kazuki Ichikawa1, Massa J. Shoura2,8, Karen L. Artiles2, Dae-Eun Jeong2, Chie Owa1, Haruka Kobayashi1, Yoshihiko Suzuki1, Manami Kanamori3, Yu Toyoshima3, Yuichi Iino3, Ann E. Rougvie4, Lamia Wahba5, Andrew Z. Fire2,6, Erich M. Schwarz7 and Shinichi Morishita1 1Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8583, Japan; 2Department of Pathology, Stanford University, Stanford, California 94305, USA; 3Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan; 4Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota 55454, USA; 5Laboratory of Non-Canonical Modes of Inheritance, Rockefeller University, New York, New York 10065, USA; 6Department of Genetics, Stanford University, Stanford, California 94305, USA; 7Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA

8 Present address: Phinomics, Incorporated, San Carlos, CA 94070, USA

Corresponding authors: afirestanford.edu, ems394cornell.edu, morisedu.k.u-tokyo.ac.jp Abstract

The original 100.3 Mb reference genome for Caenorhabditis elegans, generated from the wild-type laboratory strain N2, has been crucial for analysis of C. elegans since 1998 and has been considered complete since 2005. Unexpectedly, this long-standing reference was shown to be incomplete in 2019 by a genome assembly from the N2-derived strain VC2010. Moreover, genetically divergent versions of N2 have arisen over decades of research and hindered reproducibility of C. elegans genetics and genomics. Here we provide a 106.4 Mb gap-free, telomere-to-telomere genome assembly of C. elegans, generated from CGC1, an isogenic derivative of the N2 strain. We use improved long-read sequencing and manual assembly of 43 recalcitrant genomic regions to overcome deficiencies of prior N2 and VC2010 assemblies and to assemble tandem repeat loci, including a 772 kb sequence for the 45S rRNA genes. Although many differences from earlier assemblies come from repeat regions, unique additions to the genome are also found. Of 19,972 protein-coding genes in the N2 assembly, 19,790 (99.1%) encode products that are unchanged in the CGC1 assembly. The CGC1 assembly also may encode 183 new protein-coding and 163 new ncRNA genes. CGC1 thus provides both a completely defined reference genome and corresponding isogenic wild-type strain for C. elegans, allowing unique opportunities for model and systems biology.

Footnotes

[Supplemental material is available for this article.]

Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.280274.124.

Freely available online through the Genome Research Open Access option.

Received December 5, 2024. Accepted June 6, 2025.

Comments (0)

No login
gif