Variation in Genetic Relatedness Is Determined by the Aggregate Recombination Process [Statistical Genetics and Genomics]

Abstract

The genomic proportion that two relatives share identically by descent—their genetic relatedness—can vary depending on the history of recombination and segregation in their pedigree. Previous calculations of the variance of genetic relatedness have defined genetic relatedness as the proportion of total genetic map length (cM) shared by relatives, and have neglected crossover interference and sex differences in recombination. Here, we consider genetic relatedness as the proportion of the total physical genome (bp) shared by relatives, and calculate its variance for general pedigree relationships, making no assumptions about the recombination process. For the relationships of grandparent-grandoffspring and siblings, the variance of genetic relatedness is a simple decreasing function of Embedded ImageEmbedded Image, the average proportion of locus pairs that recombine in meiosis. For general pedigree relationships, the variance of genetic relatedness is a function of metrics analogous to Embedded ImageEmbedded Image. Therefore, features of the aggregate recombination process that affect Embedded ImageEmbedded Image and analogs also affect variance in genetic relatedness. Such features include the number of chromosomes and heterogeneity in their size, the number of crossovers and their spatial organization along chromosomes, and sex differences in recombination. Our calculations help to explain several recent observations about variance in genetic relatedness, including that it is reduced by crossover interference (which is known to increase Embedded ImageEmbedded Image). Our methods further allow us to calculate the neutral variance of ancestry among F2s in a hybrid cross, enabling precise statistical inference in F2-based tests for various kinds of selection.

VARIANCE in the amount of DNA shared by relatives identically by descent (IBD)—variance in genetic relatedness—is an important quantity in genetics (Thompson 2013). It translates to variance in the phenotypic similarity of relatives, and is a vital component of pedigree-based estimates of heritability and the genetic variance of traits (Visscher et al. 2006, 2007; Young et al. 2018). It is also an important consideration when estimating pedigree relationships and the degree of inbreeding from genotype data (Kardos et al. 2015; Wang 2016). Variance in genetic relatedness has also been hypothesized to have important consequences for the evolution of behavior (Barash et al. 1978) and of karyotypes and recombination rates (Sherman 1979; Wilfert et al. 2007). Moreover, as we show elsewhere, variance in genetic relatedness plays a key role in selection against deleterious introgressed DNA following hybridization (Veller et al. 2019a).

For most pedigree relationships, genetic relatedness can vary because of variable patterns of recombination and segregation within the pedigree. For example, it is possible that a mother segregates only crossoverless paternal chromatids to an egg, in which case the resulting offspring inherits one half of its genome from its maternal grandfather and none from its maternal grandmother. On the other hand, if the mother shuffles her maternal and paternal DNA thoroughly into the egg, the offspring will be approximately equally genetically related to its maternal grandparents. Thus, intuitively, a higher degree of genetic shuffling within a pedigree leads to lower variance in genetic relatedness between relatives.

Previous theoretical calculations of the variance of genetic relatedness have largely been restricted to measuring genetic relatedness as the proportion of total genetic map length (in cM) shared IBD by relatives [e.g., Franklin (1977); Hill (1993b); Guo (1996); Visscher et al. (2006); a general treatment is given by Hill and Weir (2011)]. However, measuring genetic relatedness as the proportion of map length shared causes several problems, most notably when the genetic maps of the two sexes differ, as will typically be the case (Lenormand and Dutheil 2005; Sardell and Kirkpatrick 2020). This is easiest to appreciate for the genetic relatedness of an individual to its paternal and maternal grandparents, the values of which are determined in a paternal and a maternal meiosis, respectively. Theoretical calculations of the variance of cM genetic relatedness require the use of genetic map lengths from the relevant meioses, and thus, in these two cases, require different definitions of genetic relatedness: proportion of total male map length for relatedness to paternal grandparents, and proportion of total female map length for relatedness to maternal grandparents. Indeed, in the extreme case where crossing over is absent in one sex—say males, as in Drosophila—cM genetic relatedness to paternal grandparents is undefined in these calculations, because the male map length is 0 cM. Practically speaking, these problems can be sidestepped by defining cM genetic relatedness in terms of a sex-averaged genetic map, but this leads to substantial biases in theoretical calculations of its variance (Caballero et al. 2019).

A natural alternative that avoids such problems is to measure genetic relatedness as the proportion of the physical length of the genome (in bp) shared IBD by relatives. For many purposes, bp genetic relatedness will be the more appropriate measure (White and Hill 2020) and, unlike cM genetic relatedness, bp genetic relatedness is unambiguous when there are sex differences in recombination. Moreover, in the modern genomic era, it will often be the case that a species’ genome has been sequenced before its genetic map has been elucidated, so that only bp genetic relatedness can be assayed.

Translating previous calculations of the variance of cM genetic relatedness to the variance of bp genetic relatedness would be valid only under the assumption of uniform recombination rates along chromosomes. This assumption is unrealistic for most species. For example, crossovers tend to be terminally localized along human chromosomes, especially in males (Holm and Rasmussen 1983; Bojko 1985). White and Hill (2020) have recently developed a procedure to estimate the variance of bp genetic relatedness without the assumption of uniform recombination rates. However, their method still assumes uniform recombination rates in the regions between adjacent markers, making it best applicable to high-density linkage maps (rather than low-density linkage maps or cytological data, which will be more readily available for some species).

In addition, previous theoretical calculations of the variance of genetic relatedness (including those for bp genetic relatedness) have assumed that crossover interference is absent. However, it has recently been shown, by computer simulation of various forms of crossover patterning along chromosomes, that crossover interference tends to decrease variances of genetic relatedness (Caballero et al. 2019). Since crossover interference is a nearly ubiquitous feature of meiosis (Hillers 2004; Otto and Payseur 2019), its neglect in previous calculations of the variance of genetic relatedness further limits their generality.

In this paper, we derive a general, assumption-free formulation for the variance of bp genetic relatedness. We show that the variance of genetic relatedness is a simple, decreasing function of certain newly developed metrics of genome-wide genetic shuffling: Embedded ImageEmbedded Image and analogs (Veller et al. 2019b). These metrics, in a natural and intuitive way, take into account features of the aggregate recombination process, such as the number of chromosomes and heterogeneity in their size, the number of crossovers and their location along the chromosomes, the spatial relations of crossovers with respect to each other (e.g., crossover interference), and sex differences in recombination.

Our formulation of the variance of genetic relatedness in terms of Embedded ImageEmbedded Image and analogs allows the effects that the above meiotic features have on the variance of genetic relatedness to be reinterpreted—often with greater intuition—in terms of their effects on aggregate genetic shuffling. For example, the fact that crossover interference decreases the variance of genetic relatedness (Caballero et al. 2019) can be explained by the intuitive fact that crossover interference, by spreading crossovers out evenly along chromosomes, increases the amount of genetic shuffling that they cause (Gorlov and Gorlova 2001; Veller et al. 2019b).

In the calculations below, the number of loci in the genome is assumed to be very large. Loci i and j are recombinant in a random gamete with probability rij (e.g., Embedded ImageEmbedded Image if i and j are on different chromosomes). Sex-specific recombination rates, Embedded ImageEmbedded Image and Embedded ImageEmbedded Image, are distinguished where necessary. We assume that there is no inbreeding; for a treatment of the variance of cM genetic relatedness in finite populations, in which a degree of inbreeding is inevitable, see Carmi et al. (2013). “Genetic relatedness” refers to bp genetic relatedness, unless specified otherwise.

Relationships of Direct Descent

Pedigree relationships of direct descent (or “lineal” relationships) involve a single lineage, from an ancestor to one of its descendants. We will focus here on the specific example of grandparent-grandoffspring; calculations of the variance of genetic relatedness for general relationships of direct descent are given in Supplemental Material, File S1, Section S1.

Grandparent-grandoffpsring

Let the random variable IBDgrand be the proportion of a grandoffspring’s genome inherited from a specified grandparent. Consider the gamete produced by the grandoffspring’s parent, and let Embedded ImageEmbedded Image be the fraction of this gamete’s genome that derives from the focal grandparent (so that, by Mendelian segregation, Embedded ImageEmbedded Image). We first wish to calculate Embedded ImageEmbedded Image. To do so, we use an approach very similar to that of Hill (1993a) and Visscher et al. (2006), but we define genetic relatedness in terms of bp shared rather than cM shared, and make no assumptions about the recombination process (in File S1, Section S3, we discuss technical differences between our calculations of the variance of bp genetic relatedness and previous calculations of the variance of cM genetic relatedness). We calculate (details in File S1, Section S1) thatEmbedded ImageEmbedded Image(1)where Embedded ImageEmbedded Image is the probability that a randomly chosen locus pair recombines in meiosis (Veller et al. 2019b). Because half of the grandoffspring’s genome comes from this gamete, Embedded ImageEmbedded Image, so that Embedded ImageEmbedded Image is the coefficient of relationship, andEmbedded ImageEmbedded Image(2)A graphical demonstration of Equation 2, based on the possible segregation patterns of a given parental meiosis, is shown in Figure 1.

Figure 1Figure 1Figure 1

The variance of genetic relatedness between grandoffspring and grandparent, calculated from the possible segregation patterns of a single parental meiosis. In the figure, the positions of crossovers in a maternal meiosis (and the chromatids involved) are specified, but the segregation pattern in the resulting egg (and therefore offspring) is not. Averaging across the four segregation patterns, we find Embedded ImageEmbedded Image, and, from Equation 1 in Veller et al. (2019b), Embedded ImageEmbedded Image. Across the four possible segregation patterns, Embedded ImageEmbedded Image and Embedded ImageEmbedded Imagewhich is Equation 2.

Note that the formulation in Equation 2 and other such formulations in this paper apply to the whole genome, or a single chromosome, or any specific genomic region. In the latter cases, Embedded ImageEmbedded Image is the probability that a randomly chosen pair of loci within the region of interest recombine in meiosis. In addition, because the recombination process often differs between the sexes, the value of Embedded ImageEmbedded Image can differ between spermatogenesis and oogenesis. In calculating the variance of genetic relatedness between a grandoffspring and one of its maternal grandparents, the value for oogenesis, Embedded ImageEmbedded Image, would be used; the value for spermatogenesis, Embedded ImageEmbedded Image, would be used for paternal grandparents.

Embedded ImageEmbedded Image can be estimated from various kinds of data, including cytological data of crossover positions at meiosis I, sequence data from gametes, and linkage maps (Veller et al. 2019b). We used cytological data from Lian et al. (2008) to calculate chromosome-specific and genome-wide values of Embedded ImageEmbedded Image in human male, and the linkage map of Kong et al. (2010) to calculate analogous values in human female (translating map distances to recombination rates using Kosambi’s map function, which incorporates a model of crossover interference). Substituting these values of Embedded ImageEmbedded Image into Equation 2 yields the variance of genetic relatedness to paternal and maternal grandparents in humans, for each chromosome and genome-wide. Table 1 displays the standard deviations, together with the corresponding standard deviations of cM genetic relatedness, calculated by substituting the sex-specific chromosome map lengths reported by Kong et al. (2010) into the relevant formula of Hill and Weir (2011).

Table 1 Standard deviations of genetic relatedness to a paternal and maternal grandparent, and to a sibling, in humans, for both bp and cM measures of genetic relatedness

Several observations emerge from Table 1. First, the variance of genetic relatedness for each individual chromosome is substantially larger than the genome-wide variance. This is because the majority of genetic shuffling in humans is due to independent assortment of chromosomes, rather than crossing over (Crow 1988; Veller et al. 2019b). Second, the variance of genetic relatedness to a paternal grandparent is greater than to a maternal grandparent, for each chromosome and genome-wide. This is because male meiosis involves less genetic shuffling than female meiosis (lower Embedded ImageEmbedded Image), owing to fewer crossovers and their more terminal localization along the chromosomes in males (Veller et al. 2019b).

In comparing the variances of bp and cM genetic relatedness, three meiotic features are relevant. Per-chromosome comparisons are affected by the location of crossovers along chromosomes (crossover distribution) and with respect to each other (crossover interference). The genome-wide comparisons are additionally influenced by independent assortment of chromosomes. We discuss the effects of these features in turn.

First, proterminal localization of crossovers in humans (especially males) reduces Embedded ImageEmbedded Image relative to a uniform distribution of crossovers (Veller et al. 2019b), increasing the variance of bp vs. cM genetic relatedness (since crossovers are uniformly distributed along the genetic map, by definition). To isolate this effect of nonuniform recombination rates, we artificially eliminate crossover interference in the calculation of Embedded ImageEmbedded Image by using linkage maps and Haldane’s map function (which, unlike Kosambi’s map function, assumes no crossover interference). Calculating Embedded ImageEmbedded Image in this way, we find that the chromosome-specific variances of bp genetic relatedness are typically larger than their corresponding cM values (File S1, Section S4), more so in males because of their more terminal distribution of crossovers.

Second, crossover interference increases Embedded ImageEmbedded Image by spreading crossovers out more evenly along chromosomes (Veller et al. 2019b), thus decreasing the variances of bp genetic relatedness relative to the corresponding variances of cM genetic relatedness (the calculations of which do not take into account crossover interference). Thus, in spite of the tendency of nonuniform recombination rates to increase the per-chromosome variances of bp genetic relatedness, these variances are nevertheless smaller than the corresponding variances of cM genetic relatedness when crossover interference is taken into account (Table 1). The negative effect of crossover interference on the variance of genetic relatedness was previously identified by Caballero et al. (2019). Interestingly, in human male, the per-chromosome variances of genetic relatedness calculated from raw (cytological) crossover data are smaller than those calculated from linkage maps using Kosambi’s map function (File S1, Section S4), suggesting that Kosambi’s map function does not capture the full influence of crossover interference on genetic shuffling in human male.

Finally, in humans, chromosome lengths are more variable when measured in bp than in cM (File S1, Section S3). This causes the contribution of independent assortment of chromosomes to Embedded ImageEmbedded Image to be smaller than if the bp lengths of the chromosomes were only as variable as the cM lengths (Veller et al. 2019b), which, in turn, increases the genome-wide variance of bp vs. cM genetic relatedness to grandparents (the mathematical details of this effect are explained in File S1, Section S3). Because of this effect, although the chromosome-specific variances of bp genetic relatedness to grandparents are substantially smaller than their cM counterparts, the genome-wide variances of bp and cM genetic relatedness are more similar (Table 1).

Indirect Relationships

Indirect relationships involve two descendants of at least one individual in the pedigree. In the case of multiancestor pedigrees, we restrict our attention to two-ancestor pedigrees where the two ancestors were a mating pair (so that the focal descendants are, for example, full siblings, or aunt-nephew, etc.). We focus here on half-siblings and full-siblings; the calculations for general indirect relationships of this kind are given in File S1, Section S2.

Half-siblings

Let the random variable IBDh-sib be the proportion of two half-siblings’ genomes that they share IBD, if they have the same father but unrelated mothers. Then Embedded ImageEmbedded Image is the coefficient of relationship, andEmbedded ImageEmbedded Image(3)where Embedded ImageEmbedded Image is the probability that a randomly chosen locus pair recombines when the crossovers of two of the father’s meioses are pooled into one hypothetical meiosis (see Figure 2 for an example of a pooled meiosis). If the common parent were instead the mother, Embedded ImageEmbedded Image would replace Embedded ImageEmbedded Image. A graphical demonstration of Equation 3, based on the possible segregation patterns of two meioses in the parent, is given in Figure 2.

Figure 2Figure 2Figure 2

The variance of genetic relatedness between half-siblings, calculated from the possible segregation patterns of two meioses of their common father. The positions of crossovers in the two paternal meioses (and the chromatids involved) are specified, but the segregation patterns in the resulting sperm cells (and therefore the two offspring) are not. Applying Equation 1 in Veller et al. (2019b) to the “pooled meiosis” in which the crossovers from the two actual meioses have been combined, we find Embedded ImageEmbedded Image Across the 16 possible segregation patterns Embedded ImageEmbedded Image, Embedded ImageEmbedded Image and Embedded ImageEmbedded Imagewhich is Equation 3.

Siblings:

Let the random variable IBDsib be the proportion of two full-siblings’ genomes that they share IBD, assuming their mother and father to be unrelated. Then Embedded ImageEmbedded Image is the coefficient of relationship, andEmbedded ImageEmbedded Image(4)Like Embedded ImageEmbedded Image, Embedded ImageEmbedded Image can be estimated from various kinds of data, including cytological data of crossover positions at meiosis I, sequence data from gametes, and linkage maps. Table 1 lists the chromosome-specific and genome-wide standard deviations of bp genetic relatedness of human siblings, calculated using cytological data from Lian et al. (2008) for male meiosis and the linkage map of Kong et al. (2010) for female meiosis (with map distances converted to recombination rates using Kosambi’s map function). Also shown are the corresponding standard deviations of cM genetic relatedness of siblings, defined as the proportion of the sex-averaged genetic map that they share IBD.

As for the case of genetic relatedness to grandparents, several meiotic features affect the comparison of the variances of bp and cM genetic relatedness of siblings. First, the bp variances are increased by the proterminal distribution of crossovers along chromosomes in humans, which tends to decrease Embedded ImageEmbedded Image. Thus, when the variance of bp genetic relatedness of siblings is calculated using linkage maps and Haldane’s map function (to eliminate the effect of crossover interference), the chromosome-specific and genome-wide estimates are substantially larger than the corresponding cM variances (File S1, Section S4). However, crossover interference, by increasing genetic shuffling, increases Embedded ImageEmbedded Image, thus decreasing the bp variance. These opposing effects of proterminal localization of crossovers and crossover interference roughly cancel in this case, so that our estimates of the variance of bp and cM genetic relatedness of siblings are coincidentally similar, at both the chromosome-specific and genome-wide levels (Table 1).

Within- vs. cross-pedigree variance

The calculations above and in Appendices S1 and S2 are for the variance of genetic relatedness in a given instance of a specified pedigree relationship. This variance derives from the randomness of recombination and segregation in the meiotic processes of the individuals involved in that particular pedigree. For some applications, however, we are interested in the variance of genetic relatedness across instances of a specified pedigree relationship [e.g., using variation in the genetic relatedness of different sibling pairs to estimate the heritability of some trait (Visscher et al. 2006)]. To calculate this ‘population variance’ of genetic relatedness, variation across individuals in their recombination processes must be taken into account. Applying the law of total variance (details in File S1, Section S5), we find that the variance of genetic relatedness across instances of a specified pedigree relationship is equal to the average within-pedigree variance. We have shown that within-pedigree variances are functions of metrics of aggregate recombination such as Embedded ImageEmbedded Image and Embedded ImageEmbedded Image; to calculate the cross-pedigree variance, these metrics must simply be averaged across pedigrees.

A complication arises when using pooled recombination data (such as linkage maps) to estimate the cross-pedigree variance of genetic relatedness, because for all such metrics of aggregate recombination except Embedded ImageEmbedded Image, calculation of the metric from averaged recombination data does not return the average of the metric across pedigrees (File S1, Section S5). It is therefore technically invalid, in such cases, to use pooled recombination data to calculate the cross-pedigree variance of genetic relatedness (although it is valid in the case of grandoffspring-grandparent).

To get a sense for how large an error the use of pooled recombination data can cause, we focus on the case of paternal half-siblings. Using crossover data generated by Bell et al. (2020) by single-cell sequencing of large numbers of sperm from 20 human male donors, we calculated values of Embedded ImageEmbedded Image for each individual donor, from which we calculated a value of Embedded ImageEmbedded Image averaged across individuals. We also calculated a value of Embedded ImageEmbedded Image from recombination rates that were averaged across individuals. The values of Embedded ImageEmbedded Image from both individual and pooled recombination rates were calculated genome-wide and per chromosome. Using the two estimates of

留言 (0)

沒有登入
gif