Evaluating the heterogeneous effect of extended culture to blastocyst transfer on the implantation outcome via causal inference in fresh ICSI cycles

Modeling the heterogeneous treatment effect of extended incubation to blastocyst transfer on implantation outcome

Extended culture to blastocyst transfer in good prognosis patients is generally recommended; however, evidence-based rigid guidelines and specific criteria to direct day-of-transfer decision-making are lacking [18]. For the purpose of simplification, we consider cleavage-stage transfers that occur three days from fertilization (day 3 transfers) and blastocyst transfers that occur five days from fertilization (day 5 transfers; Fig. 1A). We report the implantation rate and pregnancy rate, which account for the fraction of implanted embryos out of all transferred embryos and the fraction of transfer cycles that resulted in one or more implanted embryos out of all transfer cycles, respectively. The implantation and pregnancy rates of day 5 transferred blastocysts were 35.7% and 41% (Fig. 1B-i,ii), thus reflecting a 70% average increase compared with day 3 cleavage-stage transferred embryos. However, these superior implantation outcome statistics cannot be taken to be causal due to potentially confounding factors in the assignment of extended incubation to blastocyst transfer, as described below [19].

Fig. 1figure 1

Delineating the causal model of cleavage-stage versus blastocyst transfers with respect to embryo implantation outcome in IVF-ET treatments. A We consider cleavage-stage transfer cycles on day 3 from fertilization and extended culture to blastocyst transfer on day 5 from fertilization. B Fresh day 5 transfers are associated with a higher (i) implantation rate and (ii) pregnancy rate. C The distributions of the day of embryo transfer from fertilization are presented versus the day of oocyte retrieval in the four data-providing medical centers. The statistical dependence between the day of embryo transfer and oocyte retrieval is quantified using adjusted mutual information (AMI)

Given the variation in the potential of embryos to blastulate, implant, and generate live birth, we hypothesized that the response to extended culture and to differences in endometrial synchronization that are associated with blastocyst transfers would also vary between embryos in a manner that cannot be accounted for by calculating the average treatment effect (ATE) [20,21,22]. In order to estimate the causal effect of extended incubation to blastocyst transfer on implantation outcome relative to day 3 embryo transfer (nontreatment assignment) from observational data, we will employ methods based on the idea of covariate adjustment (standardization) [15]. Treatment assignment is made on day 3 from fertilization with respect to the available reserve set of fertilized embryos that become available for transfer in each cycle – This is the unit of analysis. We therefore assembled a retrospective multicenter dataset of clinically labeled video files of preimplantation embryo development that we reported previously [23]. The dataset summarizes 2175 ICSI-fertilized fresh cycles that were collected from four large data-providing medical centers. It includes 3433 transferred embryos with known implantation outcomes and 18,680 embryos that had not been transferred (Table 1; “Methods”). Except for rare exceptions, our data-providing clinics operate only on weekdays (Sunday to Friday), including oocyte retrieval (OR) and ICSI that are performed on the same day. As a result, day 5 and day 3 transfers become almost excluded for patients that underwent OR on Mondays and Wednesdays, respectively (Fig. 1C). Since OR is scheduled according to follicular growth, the day of OR lacks a causal relationship with the implantation outcome [24]. This generates an exogenous source of variability with respect to the heterogeneous treatment effect for 22% of embryos in our dataset that were transferred on day 3 and 25% of the embryos that were transferred on day 5 (Fig. 1C). Finally, we note that IVF treatments are fully subsidized for the first and second children by the Israeli Ministry of Health. This policy contributes to a better representation of all socioeconomic and ethnic sectors of relevant age in our dataset.

Table 1 Cleavage-stage and blastocyst-stage transferred embryos

As is well established in the causal inference literature, for covariate adjustment to yield valid causal estimates, we must examine whether the data-gathering process satisfies the ignorability (unconfoundedness) and overlap assumptions, as well as the stable unit treatment value assumption (SUTVA) [25]. Ignorability means that we have measurements of all factors that materially concurrently affect the treatment decision (day 3 vs. day 5 transfers) and the outcome (implantation); overlap means that every embryo could have plausibly been transferred either on day 3 or day 5; SUTVA means that (a) the implantation outcome of one embryo does not depend on the treatment of other embryos; and that (b) there are no multiple versions for day 3 and day 5 treatment protocols. Below, we address the degree to which the causal assumptions are satisfied, as represented in our data-gathering process.

Ignorability assumption: delineating the confounding variables and hidden contributions

Valid causal inference from observational data requires that all the important variables that affect both treatment and outcome are represented. Ideally, this would include variables that characterize all embryos that are candidates for transfer in each cycle. One option for blocking this potential backdoor path is using a morphokinetic representation of the developmental quality of all the available embryos [26,27,28]. However, a typical transfer cycle includes > 5 embryos in good prognosis patients, each represented by ~ 8 morphokinetic events on day 3 from fertilization, which would require a massive dataset in order to generate counterexamples to support causal inference. Including other confounders such as maternal age would make this approach even less feasible.

Retrospective and prospective studies show that maintaining a high cumulative pregnancy rate depends on the size of the reserve set of embryos of high predicted developmental potential in addition to maternal age [29, 30]. Hence, to decrease the feature dimensionality, we tested whether the morphokinetic profiles of the entire set of available embryos can be substituted by the absolute number of low-quality and high-quality embryos in reserve set in addition to the morphokinetic profiles of the embryos that are selected for transfer per se. The feasibility of this scheme depends on the ability to identify the embryos that will eventually be selected for transfer either on day 3 or on day 5. In both cases, identifying the embryos for transfer should be made based on the embryo features that are generated by day 3. To this end, we fitted a random forest model using the morphokinetic features that were recorded by 66 h from fertilization of all the embryos that belong to day 3 transfer cycles and scored their likelihood to be selected for transfer. Indeed, the embryos that were selected for transfer in the train and test set cycles were identified with high predictive accuracy as measured by the area under the receiver operating characteristic (ROC) curves (Supplementary Fig. S1A). We executed this classifier to score the likelihood of embryos that belong to day 5 transfer cycles to be selected for transfer. Notably, classification was performed based on the morphokinetic profiles that were obtained by 66 h from fertilization of the embryos that were selected for transfer as well as the embryos that were not selected for transfer while ignoring later events. The embryos were then divided into five consecutive cohorts of equal size according to their evaluated scores; 90% of the day 5 transferred qualified the highest likelihood to be selected for transfer on day 3 had this was performed (Supplementary Fig. S1B), which probably provides an underestimation of the ability to identify the embryos for transfer due to the expected redundancy in the developmental potential between transferred and non-transferred embryos in the same cycle.

Next, we characterized the effect that the specified features might have on treatment assignment. No significant differences are observed between the distributions of the morphokinetic events of day 3 and day 5 transferred embryos at 66 h from fertilization (Fig. 2A-i) and the cell cycle and synchronization intervals (Fig. 2A-ii). This indicates that embryo quality might very well be an important factor for selecting the single embryo or the few embryos for transfer but likely not for determining when to transfer – at least not as a stand-alone parameter. Parallel to embryo developmental potential, maternal age is also an important reproduction factor that is explicitly labeled for each embryo in our dataset. Maternal age is associated with an increase in chromosomal aberrations and is correlated with the size of the embryo reserve set per transfer cycle and with a decline in the developmental potential to implant in the uterus [31, 32]. Similar to the morphokinetic profiles of the embryos, there were only negligible differences in the distributions of maternal age between day 3 and day 5 transferred embryos (Fig. 2B). Hence, we conclude that maternal age does not generate a dominant effect on treatments assignment, which is independent of other factors.

Fig. 2figure 2

Feature analysis of cleavage-stage versus blastocyst transfers. A No significant differences are observed between the temporal distributions of the (i) morphokinetic events and the (ii) cell cycle and synchronization intervals of day 3 cleavage-stage transferred embryos (n = 1892) and day 5 transferred blastocysts (n = 799). KS distances < 0.1. Only high-quality embryos that reached 8-cell cleavage were included. B The maternal age distributions of day 3 and day 5 transferred embryos are overlapping. C The number of (i) low-quality (≤ 4 blastomeres) and (ii) high-quality (≥ 8 blastomeres) embryos are compared between day 3 and day 5 freshly transferred cycles. Blastocyst transfers are associated with > 3 high-quality co-cultured embryos. D Day 3 and day 5 propensity score distributions were derived using a logistic regression prediction model of the day of transfer. Lower bound (LB) and upper bound (UB) values for excluding non-overlapping embryos are set by the 2.5 percentile and the 97.5 percentile of the day 5 and day 3 propensity score distributions. Abbreviations: ET, embryo transfer; KS, Kolmogorov–Smirnov

Thus far, we considered the morphokinetic features of the embryos that were selected for transfer and the maternal age and found no significant differences between day 3 and day 5 transferred embryos. We next explore the effect that the predicted quality and size of the reserve set of non-transferred embryos have on the treatment assignment. Notably, the size of the reserve set was reported to correlate with the implantation outcome but not to influence the quality of the transferred embryos as assessed based on morphological scores nor to correlate with maternal age [33]. To test whether the size of the embryo reserve set affects treatment, we considered two embryo subsets: high-quality 8C+ embryos that consisted of ≥ 8 blastomers at 66 h from fertilization and low-quality 4C− embryos that consisted of ≤ 4 blastomers at 66 h from fertilization. The size of the subsets of high-quality embryos that are valid candidates for transfer and low-quality embryos within each cycle provides an effective integrated estimation of the developmental potential of the entire cycle. No statistically significant differences are found in the number of 4C− embryos (Fig. 2C-i). However, we identify a significant overrepresentation of day 5 transfer cycles that consisted of more than three 8C+ embryos and day 3 transfer cycles that included less than three 8C+ embryos (Fig. 2C-ii), which indicates that treatment assignment is favorably considered for cycles that include a sufficiently large reserve subset of high-quality embryos [34]. This policy is aimed at decreasing the risk of transfer cycle cancelation due to a developmental arrest of all the available embryos prior to blastulation. In the case of a small number of high-quality embryos on day 3, extended incubation is avoided and cleavage-stage transfers are performed. In summary, we conclude that we have measured or have proxies for most of the important and relevant confounders and proceed to discuss potential hidden confounders.

To address potential confounders that originate from medical backgrounds, we survey the most relevant clinical conditions. Repeated implantation failure (RIF) is likely underlined by non-embryonic maternal aspects, including endometrial receptivity [35]. However, the potentially confounding effect is small with a reported 10% prevalence, which is likely over-diagnosed [36]. In the case of past preterm labor, a single embryo transfer policy is favorable; however, there are no specific guidelines for choosing cleavage-stage or blastocyst transfer [37]. Similarly, single embryo transfer is recommended in patients with Mullerian anomalies (e.g., unicornuate uterus), which are rare congenital conditions that are characterized by an increased risk of miscarriage and preterm delivery, whereas no effects are known on the implantation potential in IVF treatments [38]. Hence, the impact of these main medical background conditions on day-of-transfer treatments is either insignificant or nonspecific. We therefore believe that all important factors affecting both treatment and outcome are represented in our dataset, leading us to conclude that the ignorability assumption is very nearly satisfied by our data-gathering process.

Overlap and SUTVA assumption: limitations to inferring the heterogeneous treatment effect on implantation outcome

To explore the extent of overlap between cleavage-stage and blastocyst transfers, we fit a logistic regression model for predicting the treatment assignment using the parameters described above: (1) the morphokinetic events from the time of pronuclei appearance (tPNa) to nine blastomere cleavage, (2) maternal age, (3) the number of 4C− embryos, and (4) 8C+ embryos at 66 h from fertilization [39]. With respect to these feature vectors, cleavage-stage and blastocyst transfers were only partially separated. This is indicated by the propensity score, which quantifies the day-of-transfer prediction probability (Fig. 2D). We find that excluding (trimming) the embryos with propensity below the 2.5 percentile of day 5 transferred embryos or above 97.5 percentile of day 3 transferred embryos removes all cases that lack counterexamples, thus satisfying the overlap assumption between conditions among the remaining embryos [40]. To characterize these embryos that are excluded from causal analysis, we compared them with the remaining embryos. There were no differences in the morphokinetic events (Supplementary Fig. S2A-i,ii), maternal age distributions (Supplementary Fig. S2B-i), and the number of 4C− embryos (Supplementary Fig. S2B-ii) between excluded and remaining embryos. However, we found that the excluded embryos were characterized either by a relatively small number (≤ 1) or a very high number (≥ 9) of 8C+ embryos (Supplementary Fig. S2B-iii).

Next, we consider the degree to which the SUTVA assumption is satisfied. Clearly, the implantation outcome of embryo transfer in one patient is not affected by the treatment assignment in other patients. Secondly, the four data-providing medical centers adhere to the same European Society of Human Reproduction and Embryology (ESHRE) guidelines, which decreases the potential confounding contributions of variation. To further minimize potential sources of variation between treatments, we included only ICSI-fertilized fresh transfer cycles of embryos that were cultured in the same automated time-lapse incubator (EmbryoScope time-lapse incubator version D, Vitrolife A/S, Denmark) under the same controlled environmental conditions.

We, therefore, conclude that similar to the ignorability and overlap assumptions, both SUTVA requirements are satisfied by our data-gathering process, which provides strong evidence that the dataset supports the possibility of valid causal inference for evaluating the causal effect of extended incubation to blastocyst transfer on implantation outcome. Our ability to infer these causal effects is further supported by the exogenous source of variability presented above [15, 41].

Fitting a causal forest model for evaluating the heterogeneous day-of-transfer treatment effect of extended culture to blastocyst transfer on embryo implantation

Establishing a strong claim for satisfying the assumptions needed for valid causal inference from observational data qualified the fitting of a CF model to evaluate the heterogeneous day-of-transfer treatment effect using the maternal, cycle, and embryo features and the overlapping dataset as validated by the propensity model [16]. Using CF, we calculated the so-called transfer lift of the embryos at 66 h from fertilization. The transfer lift is the conditional average treatment effect (CATE); namely, it is the estimated implantation potential (in terms of probability, between 0 and 1) of individual embryos if transferred on day 5 minus the same potential if transferred on day 3. The Transfer lift ranges between − 1 and 1, where positive transfer lift indicates that the implantation potential is predicted to be higher if transferred on day 5 and vice versa. A mathematical description of CF fitting and evaluating the transfer lift is provided in the “Materials and Methods” section.

The transfer lift distributes between − 0.1 and 0.35 with an average value \(\pm\) standard deviation of \(0.1\pm 0.07\) (test set) and \(0.1\pm 0.08\) (train set; Fig. 3A-i). While the majority of embryos were scored a positive transfer lift, a negative transfer lift was evaluated for 6.3% (\(N=30\)) of the test-set embryos and 8.5% (\(N=49\)) of the train-set embryos with up to 0.1 higher estimated probability for implantation if transferred on day 3. As a control, we fitted a CF model after randomly permuting the implantation outcome (Fig. 3A-ii) and the day of transfer (Fig. 3A-iii). In both cases, the transfer lift is distributed symmetrically about zero. As expected, we obtained symmetric distributions with equal representation of positive and negative transfer lift embryos that were statistically significantly separated from the non-permuted distributions. To verify generality and test the potentially confounding effects that might be generated by the differences between clinics and embryo transfer protocols, we compared the transfer lift distributions of the embryos from the four data-providing medical centers (Supplementary Fig. S3A-i,ii) and also compared the transfer lift distributions of single-embryo-transferred embryos and double-embryo-transferred embryos (Supplementary Fig. S3B-i,ii). Finally, the transfer lift distributions of treated and nontreated embryos also overlapped, thus verifying the lack of bias in the day-of-transfer treatment assignment by current IVF policies (Fig. 3B-i,ii).

Fig. 3figure 3

The transfer lift measures the difference in the implantation potential of embryos if transferred at the blastocyst stage relative to the cleavage stage. A (i) The obtained test-set and train-set transfer lift distributions are asymmetric about zero, consisting of embryos with negative (gray background) and with positive (white background) transfer lift values. As a control, transfer lift was evaluated after randomly permuting (ii) the implantation outcome and (iii) the day-of-transfer labels. The differences of the implantation outcome (ii) and the day-of-transfer (iii) permuted distributions from the non-permuted transfer lift distribution (i), as evaluated using KS statistics, was 0.34 and 0.51 KS distance respectively, and p-value < 0.01. B The transfer lift distributions of nontreated and treated embryos overlap, as quantified by the Kolmogorov–Smirnov distance for (i) the test set (KS = 0.12) and (ii) the train set (KS = 0.12). STD: standard deviation

Next, we characterize the differences in the developmental properties of negative and positive transfer lift embryos. To this end, we compared the distributions of the morphokinetic events of 8C+ embryos in each group, which reveals that the negative transfer lift embryos are characterized by slower developmental dynamics mostly in the time of five-cell and six-cell cleavage events (t5 and t6) relative to positive transfer lift embryos (Fig. 4A-i,ii). No significant differences between positive and negative transfer lift embryos were observed in maternal age (Fig. 4B-i,ii), the number of low-quality 4C− embryos (Fig. 4C-i,ii), and high-quality 8C+ embryos (Fig. 4D-i,ii), as evaluated based on Kolmogorov–Smirnov statistics. Finally, we compared the assessment of embryo developmental potential using the broadly used day 3 KIDScore classification tool (Fig. 4E-i,ii) [42]. Despite the abovementioned morphokinetic gaps between positive and negative transfer lift embryos, the KIDScore distributions overlapped, and no significant statistical dependence was obtained between the transfer lift and the KIDScore distributions as verified using adjusted mutual information analysis. In summary, the statistical indications that we generated here suggest that the transfer lift is a property of individual embryos that is independent of maternal age, oocyte retrieval statistics, and predicted embryo quality.

Fig. 4figure 4

Feature analysis of positive versus negative Transfer Lift embryos. (A) A comparison of the temporal distributions of the (i) morphokinetic events and the (ii) cell cycles (CC1-to-CC3) and synchronization (S1, S2) intervals at 66 h from fertilization indicates that negative transfer lift embryos develop slower than positive transfer lift embryos. (B) Comparison of the maternal age distributions of (i) test-set and (ii) train-set embryos. (C) Comparison of the number of low-quality embryos (4C− at 66 h from fertilization) per oocyte retrieval of positive and negative transfer lift (i) test-set and (ii) train-set embryos. (D) Comparison of the number of high-quality embryos (8C+ at 66 h from fertilization) per oocyte retrieval of positive and negative transfer lift (i) test-set and (ii) train-set embryos. (E) Comparison of day 3 KIDScore ranking of (i) test-set and (ii) train-set embryos. The dependence between the transfer lift distributions and the KIDScore distributions is quantified via AMI. The total number of embryos is depicted by the dashed lines in (B, C, and D). Abbreviations: p-val, p-values; KS: Kolmogorov–Smirnov distances; AMI: adjusted mutual information

To assess the estimated treatment effect on the actual treatment outcome, we compared the implantation rates of negative and positive transfer lift test-set embryos. The average implantation rate of positive transfer lift embryos was higher when transferred on day 5, and the average implantation rate of negative transfer lift embryos was higher when transferred on day 3 (Fig. 5A-i). The estimated treatment effect on the implantation outcome of high-quality 8C+ embryos was larger and statistically more significant (Fig. 5A-ii). To account for the limitations that may be associated with the size of the available test set, we quantified the statistical significance by performing 1000-fold randomly sampled permutation testing and further validated it using a Wilcoxon rank-sum test (Methods). Finally, we addressed the sensitivity to the question of whether the effect of the clinical site is random or not. To this end, we fit a second logistic regression model to calculate the propensity score distributions of cleavage-stage versus blastocyst transfers in which the medical center was included as a dummy variable (Supplementary Fig. S4A). As before, non-overlapping embryos with propensity scores below the 2.5 percentile of the day 5 distribution and above the 97.5 percentile of the day 3 distribution were excluded. A CF model was then fitted using the same feature vectors including the medical center dummy variable to calculate the transfer lift at 66 h from fertilization as described above. Satisfyingly, the differences in the implantation outcome between negative and positive transfer lift embryos that were transferred on day 3 versus day 5 were reproduced across all test-set embryos (Supplementary Fig. S4B-i) and 8C+ embryos (Supplementary Fig. S4B-ii). We therefore conclude that the effect of the clinical site is not significant.

Fig. 5figure 5

The transfer lift measures the heterogeneous day-of-transfer treatment effect. A The implantation rates of negative and positive transfer lift embryos that were transferred on day 3 and day 5 are compared. Average implantation rates are presented (i) for all test-set embryos and (ii) across high-quality embryos only (8C + at 66 h). Error bars represent STD. P-values were evaluated using permutation testing (1000-fold randomly sampled permutations). B Retrospective comparison of the actual implantation rate (current proposed) and the estimated implantation rate by the proposed policy of transferring negative and positive transfer lift embryos at day 3 and day 5. Error bars depict STD. KS: Kolmogorov–Smirnov. STD: standard deviation

The evaluation of the heterogeneous treatment effect indicates that the implantation rate could be increased by transferring negative transfer lift embryos on day 3 and positive transfer lift embryos on day 5. To test this proposed policy, we retrospectively re-adjusted the implantation outcome of positive transfer lift embryos that were transferred on day 3 and negative transfer lift embryos that were transferred on day 5 by adding the absolute transfer lift values. This re-adjustment scheme is based on the fact that the transfer lift measures the difference in the probability of the embryos to implant if transferred at the cleavage stage and the blastocyst stage. In comparison with the current policy, the proposed policy is retrospectively estimated to have increased the average implantation rate from 0.2 to 0.27, which is a 32% increase relative to the current policy (Fig. 5B).

留言 (0)

沒有登入
gif