Sex differences in muscle protein expression and DNA methylation in response to exercise training

Datasets

The E-MTAB-11282 data were publically available and accessed from Array Express. The Gene SMART data were collected in our lab, have been used for other publications [20, 27, 28], and are publically available on GEO (GSE171140). For the Gene SMART study, the exercise training protocol, study design, and methods have been extensively outlined previously [24]. Brief descriptions regarding the Gene SMART study protocol are outlined below. The DNA methylome meta-analysis was conducted on E-MTAB-11282 and the Gene SMART datasets. The proteome analysis was conducted solely on the Gene SMART dataset.

Muscle biopsy and blood sampling

Muscle biopsies were sampled from the vastus lateralis muscle after an overnight fast, using a suction‐modified Bergström needle, under local anaesthesia of the skin and fascia (1% Xylocaine). The muscle samples were cleaned of excess blood, fat, and connective tissue and then flash-frozen in liquid nitrogen and stored in – 80 ºC. Intravenous blood was drawn immediately after the biopsy.

Study design and physiological measurements

An overview of the exercise protocol used in the Gene SMART (Skeletal Muscle Adaptive Response to Training) study has been previously published [25]. The training intervention consisted of 4 weeks of a control period, followed by 4 weeks of high-intensity interval training (HIIT) performed on a cycle ergometer. The sex comparison of physiological measurements (VO2max, PP, or LT), before and after the interventions, was analysed using a linear model of the form:

$$_\mathrm \sim \mathrm*\mathrm+\mathrm.$$

Controlling for diet

Participants were provided with individualised, pre-packaged meals for the 48 h prior to the resting muscle biopsies. The energy content of the provided meals was calculated using the Mifflin St-Jeor equation and each participant’s body mass, height and age [26]. The content of the diets were constructed based on the current National Health and Medical Research Council (NHMRC) guidelines. Participants were provided with a post-training and post-testing snack consisting of protein (0.3 g kg–1 BM) and carbohydrates (0.3 g kg–1 BM) [29]. Participants were asked to refrain from alcohol and caffeine during the dietary-control period, which is 48 h prior to each resting biopsy. Outside of the dietary-control period they were asked to continue with their normal exercise and dietary habits.

Participants and control of confounders

Females with a regular menstrual cycle (26–35 days) [30] not taking hormonal contraceptives were recruited in order to obtain a homogenous cohort, as different contraceptives have different dosage, administration patterns, and different hormone combinations causing variability in metabolism and gene expression [31]. For consistency and to control for the potential effects of hormonal fluctuations during the female menstrual cycle, all biopsies were performed during the early follicular phase (days 1–7 of cycle).

Participants (total of six females and one male) served as their own controls as they underwent 4 weeks of a control period prior to starting the training, this was done in order to assess whether DNA methylation fluctuates with regular lifestyle (diet, sleep, exercise, etc.) in the absence of the exercise training intervention (Additional file 1: Fig. S1F).

DNA extraction and methylation

Genomic DNA was extracted from the samples using the AllPrep DNA/RNA MiniKit (Qiagen, 80204) following the user manual guidelines. Global DNA methylation profiling was generated with the Infinium MethylationEPIC 850K BeadChip Kit (Queensland University of Technology and Diagenode, Austria). The first batch contained only males, were randomised for timepoint and age and were randomised across chips to minimise batch effects. The second batch contained males and females and samples were scrambled on the chips to ensure randomness when correcting for batch effect (i.e. old and young males and females across all time points included on each chip). The genome-wide DNA methylation pattern was analysed with the Infinium MethylationEPIC BeadChip array.

Protein extraction and proteomics

Muscle tissue was lysed in 300 µl SDS solubilisation buffer (5% SDS, 50 mM TEAB, pH 7.55), heated at 95 °C for 10 min and then probe-sonicated before measuring the protein concentration using the BCA method. A total protein amount of 100 µg (suspended in 50 µl) was used for each sample for subsequent analyses. The lysed samples were denatured and alkylated by adding TCEP (Tris(2-carboxyethyl) phosphine hydrochloride) and CAA (2-chloroacetamide) to a final concentration of 10 mM and 40 mM, respectively, and the mixture incubated at 55 °C for 15 min. Sequencing grade trypsin was added at an enzyme-to-protein ratio of 1:50 and incubated overnight at 37 °C after the proteins were trapped using S-Trap mini columns (Profiti). Tryptic peptides were eluted from the columns using (i) 50 mM TEAB, (ii) 0.2% formic acid and (iii) 50% acetonitrile, 0.2% formic acid. The fractions were pooled, concentrated in a vacuum concentrator and reconstituted in 40 µl 200 mM HEPES, pH 8.5. Using a Pierce Quantitative Colorimetric Peptide Assay Kit (Thermo Scientific), equal peptide amounts of each sample were labelled with the TMTpro 16plex reagent set (Thermo Scientific) according to the manufacturer’s instructions and considering a labelling strategy to minimise channel leakage. Individual samples were pooled and high-pH RP-HPLC was used to fractionate each pool into 12 fractions, acquired individually by LC–MS/MS to maximise the number of peptide and protein identifications.

Using a Dionex UltiMate 3000 RSLCnano system equipped with a Dionex UltiMate 3000 RS autosampler, an Acclaim PepMap RSLC analytical column (75 µm × 50 cm, nanoViper, C18, 2 µm, 100 Å; Thermo Scientific) and an Acclaim PepMap 100 trap column (100 µm × 2 cm, nanoViper, C18, 5 µm, 100 Å; Thermo Scientific), the tryptic peptides were separated by increasing concentrations of 80% acetonitrile (ACN)/0.1% formic acid at a flow of 250 nl/min for 158 min and analysed with an Orbitrap Fusion Tribrid mass spectrometer (ThermoFisher Scientific). The instrument was operated in data-dependent acquisition mode to automatically switch between full scan ms1 (in Orbitrap), ms2 (in ion trap) and ms3 (in Orbitrap) acquisition. Each survey full scan (380–1580 m/z) was acquired with a resolution of 120,000, an AGC (automatic gain control) target of 50%, and a maximum injection time of 50 ms. Dynamic exclusion was set to 60 s after one occurrence. Cycle time was fixed at 2.5 s, the most intense multiply charged ions (z ≥ 2) were selected for ms2/ms3 analysis. Ms2 analysis used CID fragmentation (fixed collision energy mode, 30% CID Collision Energy) with a maximum injection time of 150 ms, a “rapid” scan rate and an AGC target of 40%. Following the acquisition of each MS2 spectrum, an ms3 spectrum was acquired from multiple ms2 fragment ions using Synchronous Precursor Selection. The ms3 scan was acquired in the Orbitrap after HCD collision with a resolution of 50,000 and a maximum injection time of 250 ms.

The raw data files were analysed with Proteome Discoverer (Thermo Scientific) to obtain quantitative ms3 reporter ion intensities.

Proteomics bioinformatics analysis

Before normalisation, proteomic intensity data were filtered for high-confidence protein observations. In addition, contaminants, proteins only identified by a single peptide and proteins not identified/quantified consistently across the experiment were removed. The remaining missing values were imputed using the missing-not-at-random (MNAR) method, assuming the missingness was due to low expression for such proteins. Intensity was log transformed and normalised using the variance–stabilising–normalisation (VSN) method, which transforms the data in such a way that the variance remains nearly constant over the whole intensity spectrum (Additional file 1: Fig. S4). Both imputations and VSN were conducted by the DEP package [32]. Batch effects were corrected using internal referencing scaling (IRS) method [33] by the use of reference channels.

To identify differentially expressed proteins, we used linear models implemented in the limma package in R [34], using the participants’ ID as a blocking variable to account for the repeated measures design. Proteins showing a π-value < 0.005 were considered significant, which was calculated using the absolute value of the logFC and the FDR as described in Xiao et al. [35]. π-value is a mathematic combination of p-value and log2FC for better ranking of genes (calculated according to [35]), which was used for proteomics analysis.

DNA methylation bioinformatics analysis

The pre-processing of DNA methylation data was performed according to the bioinformatics pipeline developed for the Bioconductor project [36]. Raw methylation data were pre-processed, filtered and normalised across samples. Probes that had a detection p-value of > 0.01, located on X and Y chromosomes or cross-hybridising, or related to a SNP frequent in European populations, were removed. It is important to note that the list of cross-hybridising probes was supplied manually [37] as the list supplied to the ChAMP package was outdated. Specifically, there are thousands of probes in the Illumina microarrays that cross-hybridise with the X-chromosome and may lead to false discovery of autosomal sex-associated DNA methylation [38]. The BMIQ algorithm was used to correct for the Infinium type I and type II probe bias. β-values were corrected for both batch and position in the batch using ComBat [39].

To identify DMPs, we used linear models as implemented in the limma package in R [34], using the participants’ ID as a blocking variable to account for the repeated measures design. All results were adjusted for multiple testing using the Benjamini and Hochberg correction [40] and all CpGs showing an FDR < 0.005 were considered significant for the association of DNA methylation with baseline fitness [41]. When no DMPs were detected at FDR < 0.005, we examined the histogram of p-values to evaluate whether results were truly negative or whether we were underpowered. CRF-associated DMRs were identified using the DMRcate package [42]. DMRs with Stouffer, Fisher, and harmonic mean of the individual component FDRs (HMFDR) statistics < 0.005 were deemed significant. Effect sizes are reported as mean differences in DNA methylation beta values (%).

We adjusted each EWAS for bias and inflation using the empirical null distribution as implemented in bacon [43]. Inflation and bias in EWAS are caused by unmeasured technical and biological confounding, such as population substructure, batch effects, and cellular heterogeneity [44]. The inflation factor is higher when the expected number of true associations is high; it is also greater for studies with higher statistical power [43]. The results were consistent with the inflation factors and biases reported in an EWAS in blood [43]. Results from the independent EWAS were combined using an inverse variance weighted meta-analysis with METAL [45]. We used METAL since it does not require all DNA methylation datasets to include every CpG site on the HumanMethylation arrays. For robustness, we only included CpGs present in both cohorts (639,759 CpGs). Both MICT [46] and HIIT [47] induce skeletal muscle DNA methylation and VO2max changes, therefore we were able to meta-analyse the Gene SMART and E-MTAB-11282 cohorts.

We integrated a comprehensive annotation of Illumina HumanMethylation arrays [48] with chromatin states from the Roadmap Epigenomics Project [49] and the latest GeneHancer information [50]. Baseline fitness-DMPs that were annotated to two differing chromatin states were removed for simplicity and because there were very few such DMPs. GSEA on Reactome and GO databases was performed on DMRs using the goregion (for GO) and gsameth (for Reactome) functions in the missMethyl R package [51, 52]. The linear models used are in Additional file 1: Fig. S1. Integration of the DNA methylome and proteome was performed using the Mitch R package utilising all genes in the analysis; for DNA methylation, gene statistics were averaged across CpGs annotated to the same gene.

For the analysis of the both proteome and DNA methylome, the linear models used were are of the form:

To assess for overall proteome/DNA methylome associations with training (denoted as “timepoint”) and CRF (denoted as “baselineVO2max”), irrespective of sex:

$$\mathrm/\mathrm\sim \mathrm+\mathrm+\mathrm+\mathrmV}_\mathrm.$$

To assess for sex differences in proteome/DNA methylome associations with training:

$$}/} \, \mathrm\sim \mathrm*\mathrm+\mathrm+\mathrmV}_\mathrm.$$

To assess for sex differences in proteome/DNA methylome associations with CRF:

$$}/} \, \mathrm\sim \mathrm+\mathrm*\mathrmV}_\mathrm+\mathrm.$$

For DNA methylation analysis, batch was also included in the linear models for Gene SMART and lean/obese for E-MTAB-11282. Timepoint refers to before and after the training intervention. Age was included in the linear models given the known effect age on the methylome and proteome [53].

To assess whether CRF and training converge sex-biased DNA methylation sites, we ran a Pearson correlation between the first dimension of the principal component analysis (PCA) of sex-biased DMPs and CRF/training intervention. Furthermore, we compared the effects of sex vs sex*training and sex vs sex*CRF at these loci. Lastly, we removed the effects of the rest of the covariates by extracting the residuals of the linear models not containing the covariate of interest (training or CRF). This allowed to visualise whether training or CRF shifted the sex-biased methylome, when all other factors such as sex and age, were removed.

留言 (0)

沒有登入
gif