Identification and verification of plasma protein biomarkers that accurately identify an ectopic pregnancy

Study design

For both the discovery (n = 48; 16 IUP, 16 EPL, and 16 EP) and verification (n = 74; 25 IUP, 24 EPL, and 25 EP) cohorts, plasma samples were collected prospectively from consenting women with symptomatic early-stage pregnancies. For inclusion in this study, participants were required to meet the following criteria: (1) abdominal pain and/or vaginal bleeding; (2) 5–10 weeks of gestation; (3) no chronic medical conditions such as diabetes or obesity; and (4) informed consent to participate in data and sample collection for the Ectopic Pregnancy Biomarkers Bank. Specimens were selected for each cohort from the biobank such that the three pregnancy outcome groups (IUP, EPL, and EP) had similar distributions for gestational age (GA).

GA (based on last menstrual period and/or ultrasound), race, ethnicity, β-hCG, and maternal age were recorded for each subject upon initial examination, if available, and pregnancy outcome was obtained at the time of sample collection or shortly thereafter. Pregnancy outcomes were classified based on consensus definitions for formal diagnosis [13]. Patient characteristics for the discovery and verification cohorts are reported in Table 1.

Table 1 Demographic and clinical characteristics of the discovery (N = 48) and verification (N = 74) cohorts Plasma collection and processing

Blood was collected by venipuncture into K2EDTA plasma tubes (BD, Franklin Lakes, NJ), and centrifuged for 10 min at 1,500 x g at room temperature. Plasma was aliquoted in 500 µl volumes into cryotubes, snap frozen using liquid nitrogen within 2 h of blood collection, and stored at -80 °C. Before downstream processing was performed, samples were thawed briefly in a RT water bath with intermittent periods of cooling on ice to prevent sample warming above 0–4 °C. Thawed samples were centrifuged for 10 min at 12,000 x g at 4 °C, aliquoted into smaller volumes (40–100 µl), snap frozen using liquid nitrogen, and stored at -80 °C until used.

IGY-14/Supermix depletion

Samples were depleted of abundant plasma proteins using IGY-14 and Supermix immunodepletion columns (Sigma-Aldrich, St. Louis, MO) connected in tandem as previously described [14]. Typically, 100 µl (discovery cohort) or 50 µl (verification cohort) aliquots of plasma were thawed, centrifuged for 10 min at 12,000 x g at 4 °C, diluted five-fold with equilibration buffer, filtered through a 0.22 μm microcentrifuge filter, and injected onto the columns. The flow-through fractions containing unbound proteins were collected, pooled, and concentrated using a 10 K MWCO centrifugal filter unit.

(MilliporeSigma, Burlington, MA), concentrator membranes were extracted with 1% SDS and extracts were combined with the concentrated sample. Concentrated samples were snap frozen, dried using a SpeedVac centrifugeand stored at -20 °C prior to 1-D SDS-PAGE and LC-MS/MS analysis.

SDS-PAGE/In-gel trypsin digestion

For plasma samples collected from the discovery cohort, IGY-14/Supermix-depleted samples were resuspended in SDS sample buffer, loaded onto pre-cast NUPAGE gels (Thermo Fisher Scientific, Waltham, MA), and separated using MES running buffer (Thermo Fisher Scientific) until the tracking dye had migrated 1.6 cm. Gels were stained with Colloidal Blue (Thermo Fisher Scientific), and the entire gel lane was excised and divided into six fractions, based on gel band staining, as previously described [15]. Fractions were digested overnight using 20 ng/ml modified trypsin. For plasma collected from the verification cohort, samples were processed similarly as described above with the exception that samples were run for 0.5 cm onto gels followed by overnight digestion using 10 ng/ml modified trypsin [15]. Digested samples were dried using a SpeedVac centrifuge and stored at -20 °C. Dried samples were re-suspended in 0.1% formic acid/3% ACN or 0.1% formic acid prior to LC-MS/MS discovery or PRM-MS verification, respectively.

Stable isotope labeled (SIL) peptide standards preparation

Individual “heavy” SIL peptide stock solutions were prepared as follows: SpikeTides-TQL peptides (JPT Peptide Technologies, Berlin, Germany) were cleaved from their quantification tag (Qtag) prior to stock solution preparation [16]. Briefly, 1 nmol of each dried SIL peptide was solubilized in 25 mM ammonium bicarbonate/20% ACN and digested in-solution with 10 ng/µl trypsin (enzyme/peptide ratio of 1:100) in 25 mM ammonium bicarbonate overnight. Cleaved and digested SIL peptides were dried using a SpeedVac centrifuge, resuspended, and aliquoted in stock solutions of 10 pmol/µl in 10% ACN/2% formic acid. Additionally, dried SpikeTides-L and Maxi SpikeTides-QL peptides (JPT Peptide Technologies) were resuspended and aliquoted in stock solutions of 10 pmol/µl in 10% ACN/2% formic acid, and AQUA peptides (Thermo Fisher Scientific) were aliquoted in stock solutions at 5 pmol/µl in 5% ACN.

Stock solutions of cleaved SpikeTides-TQL (15 total), Maxi SpikeTides-QL (3 total), SpikeTides-L (16 total), and AQUA QuantPro peptides (2 total) were pooled at fmol amounts ranging from ~4 to 70 fmol each based on MS signal intensity that was pre-determined in quality control analyses of individual peptides. The pooled SIL peptide stock solution (10 pmol/µl) to be used for all subsequent quantitation analyses was aliquoted, snap frozen, and stored at -20 °C. Prior to PRM-MS analysis, the pooled SIL peptide stock solution was thawed and diluted ten-fold to a final concentration of 1 pmol/µl in 0.1% formic acid/3% ACN/0.004% PEG. Next, 5 µl (5 pmol) was added to resuspended digests (35 µl) containing the equivalent of 15 µl of original plasma. PRM-MS sample injections (see below) contained the equivalent of 3 µl original plasma and 1 pmol of pooled SIL peptide standards.

LC-MS/MS

Samples were analyzed on a Q Exactive HF mass spectrometer (Thermo Scientific) equipped with a nanoACQUITY ultrahigh pressure liquid chromatography (UPLC) System (Waters, Milford, MA) and a column heater maintained at 45 °C. Tryptic digests were injected onto a UPLC Symmetry trap column (180 μm i.d. x 2 cm packed with 5 μm C18 resin; Waters), and peptides were separated by reversed phase-ultra high pressure liquid chromatography (RP-UPLC) on a BEH C18 nanocapillary analytical column (75 μm i.d. x 25 cm, 1.7 μm particle size, Waters) at a flow rate of 200 nl/min. Solvent A was Milli-Q (MilliporeSigma) water containing 0.1% formic acid, and solvent B was acetonitrile containing 0.1% formic acid. For the discovery cohort, peptides were eluted using a 70 min LC gradient as previously described [15].

PRM-MS

For the verification cohort, samples were analyzed on a Q Exactive HF mass spectrometer (Thermo Scientific) equipped with a nanoACQUITY UPLC System (Waters, Milford, MA) as described above. Peptides were eluted at 200 nl/min using an acetonitrile gradient consisting of 5–30% B over 110 min, 30–40% B over 10 min, 40–80% B over 5 min, 80% B for 10 min before returning to 5% B over 0.5 min. The column was re-equilibrated using 5% B at 300 nl/min for 5 min before injecting the next sample. To minimize carryover, a blank was run between each experimental sample by injecting water and using a 30 min gradient with the same solvents. The PRM method consisted of a full MS scan (m/z 375–1150) acquired in profile mode at 30,000 resolution, followed by up to 20 MS/MS scans from an inclusion list containing the m/z, charge state, and retention time ± 5–6 min for each targeted peptide. PRM scans were acquired in profile mode at 30,000 resolution with a target AGC of 2 × 105 ions and max injection time of 120 ms. An isolation width of 0.7 m/z and normalized collision energy of 28% were used.

A reference plasma sample from a pool of all EPL plasma samples from the verification cohort was depleted, digested, and spiked with SIL peptide standards as described above, and then aliquots of the final digest with added internal standard SIL peptides were snap frozen. An aliquot of this reference sample was typically run at the beginning, middle, and end of each set of samples to monitor variations in PRM signal intensities caused by changes in performance of the HPLC, reversed-phase column, or mass spectrometer.

MS data analysis

Raw mass spectrometric data from the proteomics discovery were searched against the human UniProt database (released 8/29/16) and processed using label-free quantitation (LFQ) with MaxQuant (v. 1.5.2.8) [17], and the “match between runs” option [18] as previously described [19]. Protein identifications were filtered using Perseus software (v. 1.6.2.3; http://www.perseus-framework.org) [20] to remove decoy database reverse identifications, contaminants, proteins identified only by site modified peptides, or proteins identified by a single uniquely-mapping peptide.

In Perseus, protein group LFQ intensities were log2 transformed to reduce the impact of outliers. For pairwise comparisons of the discovery analysis, samples were categorized into groups based on pregnancy outcome (EP, IUP, or EPL). Protein groups having less than 50% of valid values (i.e., those with MS1 quantification results) present in every categorical group were removed. Prior to statistical analysis, missing data points were imputed from a Gaussian distribution of random numbers that simulate the distribution of low signal values (distribution width = 0.3, shift = 1.8). Perseus was also used for data visualization using volcano plots.

For PRM-MS analyses of the verification cohort, raw data files were analyzed using Skyline (v. 21.2) [21], and automated fragment ion selection (5 ions/peptide) was utilized. The summed peak area of the 3–4 most intense fragment ions was used to quantify both “light” (i.e., endogenous) and “heavy” (i.e., SIL) peptides. Missing peaks and/or peptide fragment peaks with mass error >10 ppm were removed. For peptides containing methionine, both the oxidized and non-oxidized forms were quantified separately, and peak areas were summed prior to calculating abundance.

Calibration curves of individual SIL peptides were prepared using an EPL plasma pool as a background to evaluate matrix effects. To determine linear ranges, upper limits of quantitation (ULOQ), and lower limits of quantitation (LLOQs), a seven-point dilution series of the SIL peptide pool (range: 0.64 fmol-10 pmol) was spiked into the EPL plasma pool and analyzed in duplicate by PRM-MS. Skyline was used to plot linear calibration curves and 1/x2 weighting was used. Peptides quantified in the 74 individual plasma samples that had quantities below the LLOQs were set to zero.

The abundance of each targeted peptide was calculated as the ratio between the light peptide and heavy peptide (L/H ratio). The amount of light peptide was calculated from the L/H ratio times the amount of heavy peptide spiked into the sample. Protein level in each sample was determined by taking the average of its targeted quantified peptides and the final protein concentration was calculated based on the volume of plasma analyzed.

Statistical methods

Statistical analyses were performed using Perseus software (v.1.6.2.3), Microsoft Excel 2016, GraphPad Prism (v.5.04 and v.7), Stata 16, and R (v.4.2.1). For the discovery cohort, samples were grouped to identify differences related to early pregnancy complications such as EP or EPL. For the pairwise comparisons, two-tailed, two-sample Student’s t-test statistic was calculated, and a permutation-based false discovery rate (FDR) was applied (FDR ≤ 0.05, 250 permutations, S0 = 0.1) [22]. High priority (FDR ≤ 0.05) and additional significant (p ≤ 0.05 and fold change ≥ 3) candidate biomarkers were selected for further comparison between EP vs. non-EP (IUP + EPL) by a non-parametric Wilcoxon rank-sum test. Additionally, the area under the curve (AUC) from receiver operating characteristic (ROC) curves were assessed.

We aimed to have enough statistical power to verify the identified proteins from the discovery cohort in a verification cohort. Based on the data from the discovery cohort, we expected to verify markers with an effect size >0.7. Here, effect size refers to the difference between group means (EP vs. non-EP) divided by the pooled standard deviation. A verification set with 25 EP and 45 or more non-EP (IUP + EPL) would have at least 80% power at a two-sided type I error rate of 0.05 to verify a marker as long as its effect size is >0.7. To predict EP in the verification cohort, each biomarker was assessed using Wilcoxon rank-sum test with and without FDR adjustment calculated using Benjamini–Hochberg correction. Those markers with FDR ≤ 0.05 were further evaluated as potential predictors with least absolute shrinkage and selection operator (Lasso) regularization and logistic regression being used to explore biomarkers that may be used for better prediction of EP than models with a single predictor. For Lasso and logistical regression analyses, zero values were set as 0.01 and then the protein concentrations for all candidate biomarkers were log2 transformed. Correlations between candidate biomarkers were examined using Spearman’s rank correlation coefficient. Final predictors for the multivariable logistic model were selected using the Lasso technique with the 5-fold cross-validation and one-standard-error rule for determining the optimal tuning parameter. Due to the modest sample size of the study, variables selection was determined using 100 independent rounds runs of 5-fold cross-validation Lasso. The biomarkers that were selected 80 or more times from 100 runs were used as a final set of predictors in logistic model. Additionally, three other models were explored using protein substitutions based on the Spearman correlation cluster analysis. The predictive ability of the final logistic models was assessed by AUC, sensitivity, specificity, and accuracy, defined as the sum of true positives and true negatives divided by the total cohort size.

留言 (0)

沒有登入
gif