Overcoming limitations to customize DeepVariant for domesticated animals with TrioTrain [METHOD]

Jenna Kalleberg1, Jacob Rissman1 and Robert D. Schnabel1,2 1Division of Animal Sciences, University of Missouri, Columbia, Missouri 65201, USA; 2Genetics Area Program, University of Missouri, Columbia, Missouri 65201, USA Corresponding author: schnabelrmissouri.edu Abstract

Generating high-quality variant callsets across diverse species remains challenging as most bioinformatic tools default to assumptions based on human genomes. DeepVariant (DV) excels without joint genotyping while offering fewer implementation barriers. However, the growing appeal of a “universal” algorithm has magnified the unknown impacts when used with non-human species. Here, we use bovine genomes to assess the limits of using human genome–trained variant callers, including the allele frequency channel (DV-AF) and joint-caller DeepTrio (DT). Our novel approach, TrioTrain, automates extending DV for diploid species lacking Genome-in-a-Bottle (GIAB) resources, using a region shuffling approach to mitigate barriers for SLURM-based clusters. Imperfect animal truth labels are curated to remove Mendelian discordant sites before training DV to genotype the offspring correctly. With TrioTrain, we use cattle, yak, and bison trios to create the first multispecies-trained DV-AF checkpoint. Although incomplete bovine truth sets constrain recall within challenging repetitive regions, we observe a mean SNV F1 score >0.990 across new checkpoints during GIAB benchmarking. With HG002, a bovine-trained checkpoint (28) decreased the Mendelian inheritance error (MIE) rate by a factor of two compared with the default (DV). Checkpoint 28 has a mean MIE rate of 0.03% in three bovine interspecies cross genomes. These results illustrate that a multispecies, trio-based training strategy reduces inheritance errors during single-sample variant calling. Although exclusively training with human genomes deters transferring deep-learning-based variant calling to new species, we use the diverse ancestry within bovids to illustrate the need for advanced tools designed for comparative genomics.

Footnotes

[Supplemental material is available for this article.]

Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279542.124.

Freely available online through the Genome Research Open Access option.

Received May 3, 2024. Accepted May 27, 2025.

Comments (0)

No login
gif