Next-Generation Sequencing–Based Testing Among Patients With Advanced or Metastatic Nonsquamous Non–Small Cell Lung Cancer in the United States: Predictive Modeling Using Machine Learning Methods


Introduction

The care of patients with non–small cell lung cancer (NSCLC) has changed dramatically since the early 2010s, from a chemotherapy-based approach that was tailored only to the disease histology (squamous or nonsquamous tumors) to becoming a disease with multiple actionable biomarkers that can identify targeted therapies associated with superior outcomes based on individual patient genomic characteristics [,]. This has led to the adoption of next-generation sequencing (NGS) recommendations included in treatment guidelines for patients with NSCLC [].

Unfortunately, despite these recommendations, multiple studies have shown that NGS-based testing is not being used for all patients with advanced or metastatic NSCLC, and only about half of all patients in some studies receive comprehensive biomarker testing [-]. The reasons for the lack of testing are unclear but may include barriers to ordering tests, insufficient tissue, clinical deterioration, or a crisis that requires immediate care []. More recent studies have also demonstrated a racial disparity in receipt of biomarker testing; patients who are Black are significantly less likely than those who are White to receive NGS-based testing in the United States [].

Studies evaluating the barriers to testing have typically taken a specific hypothesis-driven a priori categorization of potential barriers to investigate the lack of testing [,]. While certainly this approach is critical to investigate specific issues such as racial disparities, this falls short when trying to evaluate the complexity of care and the multiple and potentially interacting factors. Clinical prediction models are an alternative approach to using patient-level evidence to help inform health care decision makers about patient care. These models have been used for decades by health care professionals []. Traditionally, prediction models combine patient demographic, clinical, and treatment characteristics in the form of a statistical or mathematical model, usually regression, classification, or neural networks, but deal with a limited number of predictor variables (usually below 25). Flexible machine learning methods can be used, by which the researcher does not force the model to evaluate a limited set of covariates, but rather the models themselves learn by trial and error from the data to make predictions, without having a predefined set of rules for decision-making. Simply, machine learning can be better understood as “learning from data” []. The setting of biomarker testing provides an opportunity to apply these methods to more thoroughly explore the factors that are associated with the lack of recommended biomarker testing.

While machine learning methods have been more commonly used for biomarker identification and treatment selection, there is little evidence of these methods applied to the prediction of biomarker testing itself. To date, the investigations surrounding the gaps in biomarker testing have remained largely limited to descriptive research and opinion pieces [-]. Therefore, this study was designed to fill this gap in evidence by applying machine learning methods to the question of biomarker testing for patients with advanced or metastatic nonsquamous NSCLC to determine demographic and clinical characteristics that may predict receipt of NGS-based testing. A second objective was to further determine the characteristics that predict receipt of NGS-based testing (early testing) in accordance with clinical guidelines that can inform first-line therapy (vs those who receive NGS-based testing after the first-line therapy is underway). These objectives were pursued to better understand factors associated with experiencing barriers to recommended testing and the timing of such testing to inform future intervention strategies.


MethodsData Source

This study used the Advanced NSCLC Analytic Cohort from the nationwide Flatiron Health electronic health record–derived longitudinal database, comprising deidentified patient-level structured and unstructured data, curated via technology-enabled abstraction [,]. The data are deidentified and subject to obligations to prevent reidentification and protect patient confidentiality and are not considered human participants in accordance with the US Code of Federal Regulations []. These deidentified data originate from approximately 280 cancer clinics (~800 sites of care) in the United States. Patients in this database are those who have lung cancer ICD (International Classification of Diseases) codes 162.x (ICD-9 [International Classification of Diseases, Ninth Revision]), C34x, or C39.9 (ICD-10 [International Statistical Classification of Diseases, Tenth Revision]) on at least 2 documented clinical visits on different days occurring on or after January 1, 2011. Longitudinal patient-level data were available through November 2021. Patients must further have had pathology consistent with NSCLC and have advanced or metastatic disease (diagnosed with stage IIIB, IIIC, IVA, or IVB disease or diagnosed with early-stage NSCLC and subsequently developed recurrent or progressive disease).

Definitions of NGS Testing Cohorts

Patients were included in this analysis if they were in the Flatiron Health Advanced NSCLC Analytic Cohort, had nonsquamous NSCLC, evidence of receipt of systemic therapy, and at least 3 months of follow-up in the database. Receipt of testing by NGS is a field recorded in the electronic medical record database by the health care provider that was used for testing identification in this study. The method of NGS testing (tissue or circulating tumor) is not specified. Patients were excluded who had evidence of NGS-based testing more than 20 days prior to initial NSCLC diagnosis. Patients meeting the inclusion criteria for this study were categorized into 2 groups. The ever NGS-tested group included patients with at least 1 NGS test recorded in the database. All remaining patients were included in the never NGS-tested group, as this group was comprised of patients with no evidence of any NGS test recorded in the database. Among those in the ever NGS-tested group, individuals were further subgrouped by the timing of NGS-based testing. Each patient in the ever NGS-tested group was either included in the early NGS-tested subgroup, including patients whose first or only NGS-based test occurred prior to the start of first-line therapy through day 7 of first-line therapy, or the late NGS-tested subgroup, all remaining patients whose first NGS-based test occurred 8 days or later after the start of first-line therapy. The date of advanced or metastatic diagnosis was considered the index diagnosis date.

Candidate Predictors

Candidate predictors for receipt and timing of NGS-based testing were prespecified based on published literature, analyses of real-world data, and expert input from the field of cancer diagnostics [,,]. These variables included patient age at advanced or metastatic diagnosis date (years), sex (male or female), race (Asian, Black, White, and other), insurance type (public, private, or other), Eastern Cooperative Oncology Group (ECOG) performance status (0-4), smoking history (ever vs never smoker), body weight (kilograms), BMI (kg/m2), practice setting (academic or community), practice volume (the average number of those with NSCLC receiving care at the site where the included patient received care by index year over the period 2011 to 2021), biomarker result (positive, not positive, and not tested) by each available biomarker (anaplastic lymphoma kinase [ALK]; epidermal growth factor receptor [EGFR]; V-Raf murine sarcoma viral oncogene homolog B [BRAF]; Kirsten rat sarcoma virus [KRAS]; c-ros oncogene 1 [ROS1]; mesenchymal epithelial transition [MET]; neurotrophic tyrosine receptor kinase [NTRK]; rearranged during transfection [RET]; and programmed death ligand 1 [PD-L1]), stage of disease at initial diagnosis (0-IV), laboratory value (low, normal, high, or not tested) by blood test (alkaline phosphatase, alanine transaminase, aspartate transferase, bilirubin, creatinine, lymphocyte count, red blood cell count, hematocrit, platelet count, white blood cell count, and hemoglobin), number of non-NGS biomarker tests received (total number of fluorescence in situ hybridization, immunohistochemistry, polymerase chain reaction, or other non-NGS–based tests), as well as 2 variables to identify periods of environmental changes. The first of these variables categorized the status of National Comprehensive Cancer Network (NCCN) Clinical Guidelines: prior to 2016, before NGS was recommended in the guidelines; 2016-2019, when broad-based testing was recommended; and 2020 and later, when NGS-based testing was recommended []. The second variable evaluated the timing of US Food and Drug Administration approval of drugs that targeted the available biomarkers: period (1) January 1, 2011-August 25, 2011 (EGFR drugs only); period (2) August 26, 2011-March 10, 2016 (EGFR+ALK); period (3) March 11, 2016-June 21, 2017 (EGFR+ALK+ROS1); period (4) June 22, 2017-November 25, 2018 (EGFR+ALK+ROS1+BRAF); period (5) November 26, 2018-May 5, 2020 (EGFR+ALK+ROS1+BRAF+NTRK); period (6) May 6, 2020-May 26, 2021 (EGFR+ALK+ROS1+BRAF+NTRK+MET+RET); and period (7) May 27, 2021, and later (EGFR+ALK+ROS1+BRAF+MET+NTRK+RET+KRAS) []. Additionally, candidate predictors of Medicare Administrative Contractor (MAC) region [] and Molecular Diagnostics Services (MolDX) Program adoption (yes or no) [] were included. These variables explored the policies in place at the geography in which the patient received care. MACs are private companies that process claims for Medicare beneficiaries. These companies are geographically distinct and identifiable by unique alphanumeric designations (eg, J8=jurisdiction 8) and by private company names (eg, Noridian and Palmetto) []. The MolDX Program determines the coverage of diagnostic testing in 4 MACs across 28 states [,]. Importantly, all candidate predictor variables were required to be recorded prior to the end of the early NGS testing period to ensure that no covariates were recorded after the measurement of the NGS testing outcome.

The following interactions were deemed to be clinically relevant and forced into the models for evaluation: smoking and sex, smoking and NCCN guideline periods, race and insurance type, age and ECOG performance status, MAC region and public insurance, and MolDX region and public insurance. The estimates of the expected direction of these relationships were defined in the study protocol and are summarized in .

Table 1. Expected direction of candidate predictors for next-generation sequencing (NGS) testing.Candidate predictor variableExpected directionYear of advanced or metastatic diagnosisAs year increases, NGS testing is more likely.Smoking status (yes vs no)Smoking=no, NGS testing is more likely.Sex (male vs female)Sex=female, NGS testing is more likely.Race (Asian, Black, White, other)Race=Asian or White, NGS testing is more likely.Practice volume (continuous)As practice volume increases, NGS testing is more likely.BMI (using WHOa categories)BMI=underweight, NGS testing is less likely.ECOGb performance status (0, 1, 2, 3, or 4)As ECOG performance status increases, NGS testing is less likely.Body weight (continuous, in kilograms)As weight increases, NGS testing is more likely.Stage at initial diagnosis (0-I, II, III, or IV)Stage 0-II=NGS is more likely than stage III; stage IV=NGS is more likely than stage III.EGFRc (not tested, positive, not positive) by non-NGS testEGFR=positive, NGS less likely.ROS1d (not tested, positive, not positive) by non-NGS testROS1=positive, NGS less likely.ALKe (not tested, positive, not positive) by non-NGS testALK=positive, NGS less likely.BRAFf (not tested, positive, not positive) by non-NGS testBRAF=positive, NGS less likely.KRASg (not tested, positive, not positive) by non-NGS testKRAS=positive, NGS less likely.PD-L1h (not tested, positive, not positive)PD-L1=positive, NGS less likely.Number of single-gene tests (continuous)As the number of single-gene tests increase, NGS less likely.Practice setting (academic, community)Practice setting=academic, NGS more likely.Insurance status (public, private, other)This relationship is unknown. It is possible that insurance status=public, NGS less likely; however, it is possible that in some cases, insurance status=private only, NGS could be less likely. MACi regionNo direction is known.MolDXjWhile this only applies to Medicare, states may adopt broader policies, and the relationship is uncertain. MolDX may make NGS more likely, but it is largely unknown.NCCNk guidelines (pre, broad, or NGS)NCCN guidelines=NGS, NGS more likely.Drug approval periods (1, 2, 3, 4, 5, 6, 7)As drug approval periods increase, NGS more likely.Laboratory values (high, normal, low, not tested) for alkaline phosphatase, alanine transaminase, aspartate transferase, bilirubin, creatine, lymphocyte count, red blood cell count, hematocrit, platelet count, white blood cell count, hemoglobinThe direction of a single laboratory value is unknown. However, generally one would expect multiple out-of-range values to reflect poor health and may make NGS less likely, but the a priori assumed direction is unknown.

aWHO: World Health Organization.

bECOG: Eastern Cooperative Oncology Group.

cEGFR: epidermal growth factor receptor.

dROS1: c-ros oncogene 1.

eALK: anaplastic lymphoma kinase.

fBRAF: V-Raf murine sarcoma viral oncogene homolog B.

gKRAS: Kirsten rat sarcoma virus.

hPD-L1: programmed death ligand 1.

iMAC: Medicare Administrative Contractor.

jMolDX: Molecular Diagnostics Services.

kNCCN: National Comprehensive Cancer Network.

Statistical Analysis

Descriptive analyses were conducted to summarize available data and to understand the extent of missingness in the database. Categorical variables were assessed using a 1-sided chi-square test or Fisher exact test and continuous variables using a 2-sided t test. Missing values were imputed using the random forest missing data algorithm (impute.rfsrc function in R package randomForestSRC) [].

Three modeling strategies were used to identify potential predictors of NGS-based testing with 2 sets of outcomes for ever versus never NGS-tested (model 1) and early versus late NGS-tested (model 2). The 3 modeling strategies included logistic regression (LR) models, penalized logistic regression (PLR) using least absolute shrinkage and selection operator (LASSO) penalty, and extreme gradient boosting (XGBoost) with trees as base learners. LR was implemented using forward selection on the main effects and predefined interactions (listed earlier), starting with the predefined variables and adding the most significant terms to the model. PLR was implemented using sparse group LASSO on the main effects and predefined interactions, forcing some predefined variables into the model with the penalty selected using 5-fold cross-validation. XGBoost is a decision tree–based machine learning algorithm []. The model matrix for XGBoost was built using main effects and predefined interactions. Hyperparameters were selected based on 5-fold cross-validation over a grid search, and hyperparameters included the shrinkage (learning rate), the number of trees, and tree depth. Table S1 in contains the full list of hyperparameters used in this study. The data extraction approach and modeling process is summarized in .

In step 1, data were extracted based on the prespecified inclusion and exclusion criteria. Step 2 involved variable recoding, which included transforming all categorical variables with missing information by creating an additional level to represent missing data. Step 3 was a data quality method used to identify any unusual observations that needed to be excluded or recoded in addition to any imputation that was required. Steps 4 to 6 outline the implementation of models, evaluation of the performance of these models, and interpretation of the final features selected using LR. provides an overview of the model strategy evaluation process for the 2 outcomes mentioned in step 4 of .

First, the data were split into D1 (training+validation; 80%) and D2 (testing; 20%) sets. Then, the 3 strategies were evaluated by comparing their performance on multiple m=1000 splits in the training (70%) and validation data (30%) within the D1 set. Specifically, for each split, all 3 strategies were fit to training data, and performance measures (eg, area under the receiver operating curve) were computed on the validation data. Modeling was done using R packages, sparsegl was used for LASSO, XGBoost for gradient boosting, and PRROC, which computes the areas under the precision-recall and ROC curve, for performance measures. PLR and XGBoost involved hyperparameters that were fine-tuned using 5-fold cross-validation nested within training datasets. Prediction models were developed on 2 different groups: ever versus never and early versus late NGS-tested groups. In total, 146 features (including all levels of all variables) were entered into both the XGBoost and LASSO models, with only 36 features (main effects and interactions) being used in the LR model. Preselection of features consisted of excluding variables that have little to no association with the outcomes of interest.

The final model was selected by evaluating performance as described earlier (area under the receiver operating curve from validation data) and by considering the simplicity and clinical interpretability. Model performance was re-estimated using the test data D2. For the final model choice, the features with nonzero coefficients selected by PLR were run on the D1 data. These variables were fitted to an LR model within the test data D2 to calculate model estimates (odds ratios, 95% CIs, and P values). Odds ratios for main effects in the presence of interaction terms were calculated using the analytical formula presented in . All analyses were conducted using SAS (version 9.4; SAS Institute Inc) and R (version 4.0.3; R Foundation for Statistical Computing).

Figure 1. Data extraction and modeling flow. AUC: area under the curve; CI: confidence interval; NSCLC: non–small cell lung cancer; OR: odds ratio; ROC: receiver operating curve. Figure 2. Modeling evaluation flow. EHR: electronic health record; NGS: next-generation sequencing. Ethical Considerations

The data used for this study are deidentified and subject to obligations to prevent reidentification and protect patient confidentiality, and as such are not considered human subjects research and are exempt from review in accordance with the US Code of Federal Regulations [].


Results

A total of 74,211 patient records were available in the Flatiron Health NSCLC dataset for this analysis. After applying eligibility criteria, a total of 31,407 patients were included in this analysis. Of all patients, 42.75% (n=13,425) were included in the ever NGS-tested group and 57.25% (n=17,982) were included in the never NGS-tested group. Among those in the ever NGS-tested group, 84.08% (n=11,289) were early NGS-tested, and 15.91% (n=2136) late NGS-tested. Characteristics of these groups and subgroups used as features in the machine learning models are listed in -.

Most features were significantly different between both the ever and never NGS-tested as well as the early NGS versus late NGS-tested groups. Of note, smoking rates and testing conducted during the NCCN prerecommendations period were lower for the ever NGS-tested group (n=10,589, 78.88% vs n=14,987, 83.34% and n=2663, 19.84% vs n=10,734, 59.69%, respectively), and ECOG status of 0 (n=4410, 32.85% vs n=4665, 25.94%) was higher for the ever NGS-tested group versus those who were never tested. Similarly, for the early versus late NGS-tested groups, there was a higher proportion of patients with a history of smoking (n=9025, 79.95% vs n=1564, 73.22%) and a lower proportion of testing conducted during the NCCN prerecommendations period (n=1746, 15.47% vs n=917, 42.93%) as well as a lower proportion of ECOG status of 0 (n=3606, 31.94% vs n=804, 37.64%) for the early tested group.

Comparison of performance metrics for each model showed that the percent AUC was similar across models (80%-84% and 77%-80%) and marginally better when the models were fit on the ever versus never NGS-tested groups. In addition, other metrics were also comparable (Table S2 in ). The final model chosen was the LASSO model, as it was able to identify important features including interactions (those with nonzero coefficients after shrinkage) and the metrics for each model were highly comparable (Table S2 in ). Figures S1 and S2 in show the feature importance plots for both groups. The most important factors associated with ever versus never testing included year of diagnosis, observation of a PD-L1 test, Black or African American race, and number of single-gene tests observed. The most important factors associated with early versus late testing included the observation of a PD-L1 test, a positive single-gene test result, the year of diagnosis, and the geographical region of care. Later year of diagnosis, evidence of PD-L1 testing, patient race, positive single-gene test results, and region were among the top 5 predictors of NGS testing for both ever versus never as well as early versus late NGS testing.

Table 2. Demographic characteristics of the overall, ever, and never NGSa-tested study cohorts prior to imputation.CharacteristicOverall (N=31,407)Ever NGS-testedb (n=13,425)Never NGS-testedc (n=17,982)Ever NGS-tested versus never NGS-tested, P valuedAge at initial diagnosis (years), mean (SD)67.2 (9.8)67.2 (10.1)67.3 (9.5).66Sex, n (%).0007
Female16,680 (53.11)7281 (54.23)9399 (52.27)

Male14,726 (46.89)6144 (45.77)8582 (47.73)

Unknown or missing1 (0)0 (0)1 (0.01)
Race, n (%)<.0001
Asian1050 (3.34)552 (4.11)498 (2.77)

Black or African American2845 (9.06)1089 (8.11)1756 (9.77)

White21,248 (67.65)9109 (67.85)12,139 (67.51)

Other3269 (10.41)1392 (10.37)1877 (10.44)

Unknown or missing2995 (9.54)1283 (9.56)1712 (9.52)
Smoking status, n (%)<.0001
History of smoking25,576 (81.43)10,589 (78.88)14,987 (83.34)

No history of smoking5657 (18.01)2826 (21.05)2831 (15.74)

Unknown or missing174 (0.55)10 (0.07)164 (0.91)
ECOGe performance status, n (%)<.0001
09075 (28.89)4410 (32.85)4665 (25.94)

111,215 (35.71)5275 (39.29)5940 (33.03)

23401 (10.83)1393 (10.38)2008 (11.17)

3762 (2.43)306 (2.28)456 (2.54)

451 (0.16)17 (0.13)34 (0.19)

Unknown or missing6903 (21.98)2024 (15.08)4879 (27.13)

aNGS: next-generation sequencing.

bPatients in the overall study cohort with evidence of NGS-based biomarker testing in the database.

cPatients in the overall study cohort with no evidence of NGS-based biomarker testing.

dTwo-sided t test for continuous variables; chi-square or Fisher exact test (where expected cell size <5) for categorical variables.

eECOG: Eastern Cooperative Oncology Group.

Table 3. Biomarker status of the overall, ever, and never NGSa-tested study cohorts prior to imputation.CharacteristicOverall (N=31,407)Ever NGS-testedb (n=13,425)Never NGS-testedc (n=17,982)Ever NGS-tested vs never NGS-tested, P valuedNon-NGS–based (single gene) ALKe status, n (%)<.0001
Positive617 (1.96)253 (1.88)364 (2.02)

Not positive15,626 (49.75)6278 (46.76)9348 (51.99)

Not tested15,164 (48.28)6894 (51.35)8270 (45.99)
Non-NGS–based (single gene) BRAFf status, n (%)<.0001
Positive94 (0.30)32 (0.24)62 (0.34)

Not positive3775 (12.02)1729 (12.88)2046 (11.38)

Not tested27,538 (87.68)11,664 (86.88)15,874 (88.28)
Non-NGS–based (single gene) EGFRg status, n (%)<.0001
Positive2822 (8.99)928 (6.91)1894 (10.53)

Not positive12,312 (39.20)3427 (25.53)8885 (49.41)

Not tested16,273 (51.81)9070 (67.56)7203 (40.06)
Non-NGS–based (single gene) KRASh status, n (%)<.0001
Positive1141 (3.63)298 (2.22)843 (4.69)

Not positive2958 (9.42)1082 (8.06)1876 (10.43)

Not tested27,308 (86.95)12,045 (89.72)15,263 (84.88)
Non-NGS–based (single gene) ROS1i status, n (%)<.0001
Positive128 (0.41)58 (0.43)70 (0.39)

Not positive9383 (29.88)5011 (37.33)4372 (24.31)

Not tested21,896 (69.72)8356 (62.24)13,540 (75.30)
Non-NGS–based (single gene) METj status, n (%)<.0001
Positive7 (0.02)3 (0.02)4 (0.02)

Not positive1965 (6.26)1517 (11.30)448 (2.49)

Not tested29,435 (93.72)11,905 (88.68)17,530 (97.49)
Non-NGS–based (single gene) RETk status, n (%)<.0001
Positive34 (0.11)27 (0.20)7 (0.04)

Not positive2381 (7.58)1679 (12.51)702 (3.90)

Not tested28,992 (92.31)11,719 (87.29)17,273 (96.06)
Non-NGS–based (single gene) NTRKl status, n (%)<.0001
Positive2 (0.01)1 (0.01)1 (0.01)

Not positive747 (2.38)617 (4.60)130 (0.72)

Not tested30,658 (97.62)12,807 (95.40)17,851 (99.27)
Non-NGS–based (single gene) testingm, n (%)<.0001
Any positive result observed4795 (15.27)1576 (11.74)3219 (17.90)

Never tested11,968 (38.11)5661 (42.17)6307 (35.07)

Tested, but no positive results observed14,644 (46.63)6188 (46.09)8456 (47.02)
PD-L1n status, n (%)<.0001
Positive1826 (5.81)1289 (9.60)537 (2.99)

Not positive9988 (31.80)6354 (47.33)3634 (20.21)

Not tested19,593 (62.38)5782 (43.07)13,811 (76.80)
Single-gene tests receivedm, mean (SD)2.1 (2.0)2.3 (2.0)2.0 (1.9)<.0001

aNGS: next-generation sequencing.

bPatients in the overall study cohort with evidence of NGS-based biomarker testing in the database.

cPatients in the overall study cohort with no evidence of NGS-based biomarker testing.

dTwo-sided t test for continuous variables; chi-square or Fisher exact test (where expected cell size <5) for categorical variables.

eALK: anaplastic lymphoma kinase.

fBRAF: V-Raf Murine Sarcoma Viral Oncogene Homolog B.

gEGFR: epidermal growth factor receptor.

hKRAS: Kirsten rat sarcoma virus.

iROS1: c-ros oncogene 1.

jMET: mesenchymal epithelial transition.

kRET: rearranged during transfection.

lNTRK: neurotrophic tyrosine receptor kinase.

mResults are based on biomarkers ALK, BRAF, EGFR, KRAS, ROS1, MET, RET, and NTRK.

nPD-L1: programmed death ligand 1.

Table 4. Geographic and time characteristics of the overall, ever, and never NGSa-tested study cohorts prior to imputation.CharacteristicOverall (N=31,407), n (%)Ever NGS-testedb (n=13,425), n (%)Never NGS-testedc (n=17,982), n (%)Ever NGS-tested vs never NGS-tested, P valuedMACe region<.0001
JE Noridian2097 (6.68)814 (6.06)1283 (7.13)

JF Noridian2476 (7.88)1111 (8.28)1365 (7.59)

J6 NGS856 (2.73)335 (2.50)521 (2.90)

J5 WPS603 (1.92)235 (1.75)368 (2.05)

J8 WPS2025 (6.45)1051 (7.83)974 (5.42)

JK NGS2459 (7.83)1102 (8.21)1357 (7.55)

JL Novitas2817 (8.97)1283 (9.56)1534 (8.53)

JM Palmetto2218 (7.06)858 (6.39)1360 (7.56)

J15 CGS924 (2.94)397 (2.96)527 (2.93)

JJ Cahaba4194 (13.35)2049 (15.26)2145 (11.93)

JH Novitas6093 (19.40)2176 (16.21)3917 (21.78)

Unknown or missing4645 (14.79)2014 (15)2631 (14.63)
MolDXf Program<.0001
Yes14,294 (45.51)6399 (47.66)7895 (43.91)

No12,468 (39.70)5012 (37.33)7456 (41.46)

Unknown or missing4645 (14.79)2014 (15)2631 (14.63)
NCCNg guideline period<.0001
Prerecommendations13,397 (42.66)2663 (19.84)10,734 (59.69)

Broad-based testing recommended13,552 (43.15)7339 (54.67)6213 (34.55)

NGS-based testing recommended4458 (14.19)3423 (25.50)1035 (5.76)
Timing of diagnosis by drug approval period<.0001
Period 11223 (3.89)96 (0.72)1127 (6.27)

Period 212,850 (40.91)2823 (21.03)10,027 (55.76)

Period 34396 (14)1868 (13.91)2528 (14.06)

Period 44877 (15.53)2724 (20.29)2153 (11.97)

Period 54613 (14.69)3224 (24.01)1389 (7.72)

Period 62858 (9.10)2216 (16.51)642 (3.57)

Period 7590 (1.88)474 (3.53)116 (0.65)

aNGS: next-generation sequencing.

bPatients in the overall study cohort with evidence of NGS-based biomarker testing in the database.

cPatients in the overall study cohort with no evidence of NGS-based biomarker testing.

dTwo-sided t test for continuous variables; chi-square or Fisher exact test (where expected cell size <5) for categorical variables.

eMAC: Medicare Administration Contractor.

fMolDX: Molecular Diagnostics Services.

gNCCN: National Comprehensive Cancer Network.

Table 5. Clinical care characteristics of the overall, ever, and never NGSa-tested study cohorts prior to imputation.CharacteristicOverall (N=31,407)Ever NGS-testedb (n=13,425)Never NGS-testedc (n=17,982)Ever NGS-tested vs never NGS-tested, P valuedPractice setting, n (%)<.0001
Academic3626 (11.55)1783 (13.28)1843 (10.25)

Community27,781 (88.45)11,642 (86.72)16,139 (89.75)
Insurance type, n (%)<.0001
Private+public4301 (13.69)1940 (14.45)2361 (13.13)

Private only7083 (22.55)3601 (26.82)3482 (19.36)

Public only4037 (12.85)1560 (11.62)2477 (13.77)

Multiple types8997 (28.65)4066 (30.29)4931 (27.42)

Unknown or missing6989 (22.25)2258 (16.82)4731 (26.31)
Stage at initial diagnosis, n (%)<.0001
0-I2736 (8.71)1208 (9)1528 (8.50)

II1453 (4.63)671 (5)782 (4.35)

III5621 (17.90)2227 (16.59)3394 (18.87)

IV20,929 (66.64)9096 (67.75)11,833 (65.80)

Unknown or missing668 (2.13)223 (1.66)445 (2.47)
Year of index diagnosis, n (%)<.0001
20111896 (6.04)158 (1.18)1738 (9.67)

20122402 (7.65)229 (1.71)2173 (12.08)

20132699 (8.59)476 (3.55)2223 (12.36)

20143054 (9.72)664 (4.95)2390 (13.29)

20153346 (10.65)1136 (8.46)2210 (12.29)

20163397 (10.82)1372 (10.22)2025 (11.26)

20173472 (11.05)1708 (12.72)1764 (9.81)

20183401 (10.83)1966 (14.64)1435 (7.98)

20193282 (10.45)2293 (17.08)989 (5.50)

20202777 (8.84)2066 (15.39)711 (3.95)

20211681 (5.35)1357 (10.11)324 (1.80)
Practice volumee, mean (SD)154.1 (143.6)169.2 (156.0)142.8 (132.5)<.0001BMI, n (%)<.0001
Underweight1373 (4.37)597 (4.45)776 (4.32)

Normal weight10,593 (33.73)4638 (34.55)5955 (33.12)

Overweight8897 (28.33)4019 (29.94)4878 (27.13)

Comments (0)

No login
gif