Causal relationships between breast cancer risk factors based on mammographic features

Study sample

We used data from the Australian Mammographic Density Twins and Sisters Study [17], which included female twin pairs and their sisters aged 40–70 years and without a prior diagnosis of breast cancer at the time of mammography. Information of the participants was collected by questionnaires, and permissions to access mammograms were obtained [18]. The current study involved 371 monozygotic twin pairs with complete epidemiological information and the mammographic measurements required for analysis. No individual was identified as being at a high risk for breast cancer when taking mammography nor after assessment of their mammograms.

Questionnaire

Demographic information, anthropometric measurements, menstrual and reproductive history, lifestyle factors, and personal and family history of breast cancer were collected via telephone-administered questionnaires between 2004 and 2008. Zygosity was determined from genome-wide association data [19]. As there were time differences between age at questionnaire survey and age at mammography (on average 1.68 years, with 177 participants having a more than 3-year difference), menopausal status and BMI were updated to those at age at mammography as follows. For menopausal status, if a participant was postmenopausal at questionnaire survey, and her age at menopause was older than age at mammography, her status was changed to premenopausal. BMI at mammography was predicted from BMI at questionnaire survey using the method of Haby et al. [20]. BMI at questionnaire survey was treated as the dependent variable in a regression model which included birth cohort effects and 5-year group coefficients; the intercept of the regression was then used as the BMI at age of mammography.

Mammogram-based measures

Mammograms were retrieved from BreastScreen Australia services (80%), clinics (5%), and from participants themselves (15%) and digitised using the Lumysis 85 scanner at the Australian Mammographic Density Research Facility. For each woman, only the craniocaudal-view mammogram from the right breast taken closest to the survey was used in this study.

The dense areas were measured using a computer-assisted semi-automated thresholding technique and the CUMULUS software based on a sliding scale ranging from 0 to 4095 pixels. Four observers were trained to measure mammographic density independently, as previously described [4].

A conventional pixel threshold was first used to identify dense areas with grey levels appearing at least mammographically light (the areas of which we call Cumulus). Similarly, the pixel brightness threshold was then increased to identify the denser areas (the areas of which we call Altocumulus). The pixel threshold was then further increased to identify the densest areas (the areas of which we call Cirrocumulus). The reproductivity was assessed by conducting the measurements in sets of 100 mammograms, and 10% of samples in each set were repeated. The intraclass correlation coefficients were 0.98, 0.99, and 0.93 for Cumulus, Altocumulus, and Cirrocumulus, respectively. A total of 200 images were measured for Cumulus, Altocumulus, and Cirrocumulus with the correlations between readers being 0.95, 0.89, and 0.85, respectively. Details of these three density measurements can be found elsewhere [3, 4, 17].

We created two new non-overlapping measures: light areas, which subtracted Altocumulus from Cumulus, and bright areas, which subtracted Cirrocumulus from Altocumulus. Along with a measure of the brightest areas (Cirrocumulus), their relationships, in terms of relative brightness, are shown in Fig. 1.

Cirrus is an agnostic algorithm developed using deep learning techniques applied to 20 textural features extracted from 46,158 analogue, craniocaudal-view, mammograms [8]. The algorithm was applied to the mammograms of the study sample to produce the Cirrus measures.

In this study, we conducted analyses of Cirrus and the three spatially independent density measures including light, bright, and brightest areas (Cirrocumulus). Table 1 shows the summary characteristics of unadjusted measures.

Table 1 Characteristics of mammographic measures and covariates of the monozygotic twinsStatistical methods

All mammographic measures were first transformed using a Box–Cox power transformation [21] to have an approximately normal distribution. As a result, (Cirrus-2907)2, \((\mathrm)}^\frac,\) and the cube root of light areas and of bright areas were used in the analyses.

Given that age at mammography is negatively associated with the mammographic density measures being studied as putative risk factors for breast cancer, and that breast cancer risk increases with age, all the measures were adjusted for age at mammography. This adjustment explained 8–11% of the variances in the studied measures, except for the light areas, for which the proportion of the variance explained was 2%. The variance explained by other breast cancer risk factors combined, including age at menarche, menopausal status, BMI, ever being pregnant, number of live births, benign breast disease history, and breast cancer family history, was between 4 and 7% (Additional file 1: Table S1).

The age-adjusted residuals were all standardised to have mean = 0 and standard deviation (SD) = 1. These standardised residuals are the mammogram risk scores used in the subsequent analyses. Correlations between these risk scores, within twin pair and within a person, respectively, were estimated using Pearson’s correlation coefficient.

The correlations between the risk scores were decomposed into different sources, including confounding and causal effects originated from various pathways using the Inference about Causation from Examination of FAmilial CONfounding (ICE FALCON) method [16]. ICE FALCON uses data for pairs of relatives and uses the relative’s exposure acts as a proxy instrumental variable for a person’s exposure. This method is analogous to Mendelian randomisation but does not use genetic variants as a presumed instrumental variable and does on rely on strong assumptions. ICE FALCON can make inference about causation even when the exposure and outcome are associated due to familial confounding (i.e. confounders, both known and unknown, that are shared by the exposure and the outcome and by the relatives). The ICE FALCON method has been applied in multiple fields to assess evidence for causality [16, 22,23,24,25,26,27,28].

Briefly, one risk score was assigned as the outcome Y variable and another as the predictor variable X, and the Y value of a twin was regressed against the X variable of herself and/or of her co-twin (Additional file 1: Figure S1). To assess the evidence for reverse causation, the assigning of X and Y was reversed, i.e. the aforementioned predictor and outcome swapped their positions in the refitted regression models. This was done for every pair of risk scores.

Given the Y variables are correlated within twin pairs, regression was conducted using generalised estimating equations. This effect conditioned the Y value of a twin on the Y value of her co-twin. Our model assumed that the risk score of a twin cannot have a causal effect on the same risk score of her co-twin but allowed for causation between the risk scores within a twin.

Three models were fitted to the twin pair data. First, a twin’s outcome variable was regressed on her own predictor variable to estimate the regression coefficient βself (Model 1). Second, the twin’s outcome variable was regressed on her co-twin’s predictor variable to estimate the regression coefficient βco-twin (Model 2). Third, the twin’s outcome variable was regressed on both her own and her co-twin’s predictor variables to estimate the conditional regression coefficients β′self and β′co-twin, respectively (Model 3). The use of the prime on the conditional regression coefficient estimates indicates that the Model 3 regression coefficients can be interpreted as the change in outcome for change in a given predictor while keeping the other predictor constant, which is not the same interpretation for the corresponding unconditional regression coefficients of Models 1 and 2.

If the predictor has a causal effect on the outcome, βco-twin would be different from zero, β′co-twin would be closer to zero than βco-twin, and β′self would not be different from βself. If there is familial confounding between the predictor and the outcome, β′self and β′co-twin would both be away from their corresponding coefficients βself and βco-twin to a similar extent. If there is a combination of familial confounding and causal effects, the results would be the combinations of the two scenarios. According to Wright’s path tracing rules [29], the proportion of an association which could be attributed to causality is as follows:

$$\Pr = (((}\;}\; \beta_} - }}} - (}\;}\; \beta_}}} /\beta_}}} ) \times \beta_} - }}} )/\rho )/\beta_}}} ) \times 100\%$$

where \(}\; }\; \beta_} - }}}\) = \(\beta_} - }}} - \beta_} - }}}^\), \(}\;}\; \beta_}}}\) = \(\beta_}}} - \beta_}}}^\), and \(\rho\) = the within-twin correlation of the predictor. Note that the parameter estimates were extracted only from the models which suggest that the predictor causes the outcome, not those that suggest the outcome causes the predictor; see [16]. The causal effect = \(_}\times \mathrm\), so the proportion of an association that could be attributed to familial confounding is 1 − Pr.

To investigate causal pathways between two risk scores that are not through other risk scores, we used the standardised residuals of the predictor and the outcome after adjusting for the third risk score in addition to age at mammography. The results from the analyses were used to produce a summary causal diagram. Causal relationship analyses were also conducted by the level of breast density to check whether the causal relationships differ by density levels. The sample was divided into two subgroups according to the median of 30.5% for Cumulus per cent mammographic density, with each group including 140 complete twin pairs. ICE FALCON analyses were conducted within each subgroup; see Supplemental material for more details. All the analyses were conducted using the R package [30]. P < 0.05 was considered to be nominally statistically significant.

留言 (0)

沒有登入
gif