Real-World Survival Comparisons Between Radiotherapy and Surgery for Metachronous Second Primary Lung Cancer and Predictions of Lung Cancer–Specific Outcomes Using Machine Learning: Population-Based Study


Introduction

Lung cancer has become a leading cause of cancer-related deaths worldwide []. With the rapid development of screening tools and therapeutic strategies, survival outcomes of lung cancer patients have encouragingly improved, especially for early-stage non-small cell lung cancer (NSCLC), which has a 5-year survival rate as high as 90% []. For cancer survivors, longer survival may well lead to a higher probability of developing a second primary cancer. In recent years, metachronous second primary lung cancer (MSPLC) has been commonly observed among survivors with previously treated lung cancer. Thakur et al [] reported that MSPLC occurred in 2.95% of patients with initial primary lung cancer (IPLC) in the Surveillance, Epidemiology, and End Results (SEER) database. According to the study by Surapaneni et al [], the risk of developing a second lung cancer is the highest in the first year and continues to be high at 10 years. The surveillance and management of patients with MSPLC have become an urgent issue.

For patients with an initial, early-stage lung cancer, surgical resection remains the most effective treatment. However, there is still a lack of guidelines to assess tumor resectability in patients with MSPLC. Several studies have confirmed the feasibility of surgery for MSPLC [-]. Remarkably, patients with MSPLC with previously resected lung cancer may be in poor physical condition and have insufficient lung function reserve, and another surgical procedure may not be appropriate. Thus, an alternative treatment is required for patients with inoperable MSPLC.

Radiation therapy is an effective treatment choice for patients with MSPLC and has fewer complications and impairments. Stereotactic body radiotherapy has recently been reported to have similar survival outcomes as surgery in patients with early-stage lung cancer [,]. Previous studies have shown that radiotherapy is a safe and feasible treatment for MSPLC, but whether it can compare with surgery in terms of survival outcomes remains debated [,]. Therefore, in this population-based study, the initial step involved conducting propensity score matching (PSM) analyses to compare the survival outcomes of patients who underwent surgical resection with those who received radiotherapy for multiple synchronous primary lung cancers. Furthermore, specific focus was placed on comparing the outcomes of common surgical methods, namely lobectomy and wedge resection, with those of radiotherapy for patients with MSPLC. To enhance the accuracy of the predictions, state-of-the-art machine learning (ML) techniques were used, and multiple algorithms were used to develop robust prediction models.


MethodsData Source

Data for all patients diagnosed with MSPLC included in this retrospective study were sourced from the SEER database [], covering approximately 30% of cancer patients in the United States. Data pertaining to these patients were extracted from 9 cancer registries and augmented with additional treatment information from regions including Atlanta, Connecticut, Detroit, Hawaii, Iowa, New Mexico, San Francisco–Oakland, Seattle–Puget Sound, and Utah. The data set's most recent follow-up information was updated in November 2018. This study aimed to prognosticate the outcomes for patients with MSPLC. In adherence to the established guidelines for the development and reporting of ML predictive models in biomedical research [], we meticulously maintained precision and clarity throughout our research process.

Preparation of Data for Model Building

Patients aged ≥20 years who were diagnosed with MSPLC were identified from the SEER database. We defined MSPLC according to the criteria set by Martini and Melamed []. We only included patients with 2 primary lung tumors with a diagnostic interval between the tumors ≥4 years, because it is difficult to distinguish a primary lung tumor from relapse or metastasis when the interval is <4 years []. The initial inclusion criteria were as follows: (1) primary sites of the 2 tumors were the lung and bronchus (International Classification of Diseases for Oncology [ICD-O]-3/World Health Organization [WHO] 2008, Third Edition), (2) the time of diagnosis for the IPLC was from January 1988 to December 2012 (to ensure that all enrolled patients had been followed for enough time), and (3) age was ≥20 years. The exclusion criteria included (1) <4 years between the diagnosis of the 2 primary tumors, (2) distant metastasis, (3) histological type of small cell lung cancer for IPLC or MSPLC, and (4) incomplete follow-up information.

We collected the patients’ demographic features and clinical characteristics, such as age at diagnosis, sex, race (White, Black, other [American Indian/Alaska Native, Asian/Pacific Islander], and unknown), location relationship of the 2 primary tumors (ipsilateral and contralateral), diagnostic interval, year of diagnosis, SEER cancer stage (localized and regional), histological type (adenocarcinoma, squamous cell carcinoma, and other NSCLC), grade, surgical procedure, chemotherapy, and radiotherapy (beam radiation). Sublevel resection was regarded as an extent of resection that was less than lobectomy. For patients diagnosed with IPLC after 2004, additional clinical information such as TNM (tumor [T], extent of spread to the lymph nodes [N], and presence of metastasis [M]) stage (6th edition of the American Joint Committee on Cancer TNM system) and tumor size were available.

Predictive Models

We used 6 classical ML algorithms, namely extreme gradient boosting (XGB), random forest classifier (RFC), adaptive boosting (ADB), K nearest neighbor (KNN), artificial neural network (ANN), and gradient boosting decision tree (GBDT), to forecast long-term cancer-specific survival (CSS). To select the variables for modeling, the least absolute shrinkage and selection operator (LASSO) regression technique was used. An extensive method was used to determine the optimal combination of variables for each algorithm. The performance and predictive capabilities of over a dozen variables were individually assessed using the models, measured using the area under the receiver operating characteristic curves (AUC of ROCs), and decision curve analysis was conducted. The most effective variables were identified, and additional variables were combined iteratively until the best overall results were obtained. The selection of the optimal modeling approach for each algorithm was determined using 5-fold cross-validation. Furthermore, the contribution of each variable was calculated. Additionally, age-adjusted competing risk regression analysis was conducted using the “cmprsk” package in R to examine the cumulative risk of cancer-specific mortality. This comprehensive approach facilitated a thorough evaluation of the risk factors and outcomes associated with cancer-specific mortality in diverse patient populations.

Statistical Analyses

All statistical analyses were performed using SPSS 27.0 (IBM Corp) and R software version 4.3.1 []. SEER*Stat software version 8.4.2 was used to identify the study population from the SEER database. A 2-tailed P value <.05 was considered statistically significant. Continuous parameters such as patients’ age and diagnostic interval are expressed as mean (SD) and were compared between the different treatment groups using Mann-Whitney U tests. For categorical parameters, proportions were compared using Pearson chi-square tests. To balance the baseline characteristics between the different treatment groups, PSM analyses were used. Survival curves were plotted using the Kaplan-Meier method and compared using log-rank tests.

Ethical Considerations

The data used in this research were extracted from the publicly accessible, anonymized SEER database. Given the nature of the SEER database, which contains deidentified patient information and is widely used for epidemiological and clinical research purposes, our study fell within the category of research that is exempt from formal ethical approval and consent requirements. This exemption is consistent with established institutional and local policies regarding the use of publicly available, deidentified data for research purposes [].


ResultsDemographic Characteristics

According to our inclusion and exclusion criteria, a total of 2451 patients diagnosed with MSPLC were included in this study. All patients’ baseline characteristics are summarized in . There were 1137 men and 1314 women, with a mean age of 63.5 (SD 9.2) years. White people accounted for 84.1% (2062/2451) of the study population. The mean diagnostic interval between the 2 primary lung tumors was 101.0 (SD 47.6) months. The year of diagnosis of the IPLC ranged from 1988 to 2012. For IPLC, 264 (10.8%) of the 2451 patients did not undergo any surgical procedure, while 2447 underwent surgical resection, including 295 (295/2447, 12%) sublevel resections, 1786 (1786/2447, 72.9%) lobectomies, and 106 (106/2447, 4.3%) pneumonectomies. Additionally, 465 (465/2451, 19%) patients received chemotherapy, and 489 (489/2451, 20%) underwent radiation therapy for IPLC. Based on treatments for MSPLC, patients were divided into the following 4 subgroups: radiotherapy only (864/2451, 35.3%), surgery only (759/2451, 31%), surgery plus radiotherapy (89/2451, 3.6%), and no treatment (739/2451, 30.2%). The median follow-up time after MSPLC diagnosis was 18 (range: 1-273) months. For the entire study population, the 5-year overall survival (OS) was 34.7%.

Table 1. Demographic and clinical characteristics of 2451 patients diagnosed with second primary lung cancer.CharacteristicResultsAge (years), mean (SD)63.5 (9.2)Race, n (%)
White2062 (84.1)
Black240 (9.8)
Other149 (6.1)Sex, n (%)
Male1137 (46.4)
Female1314 (53.6)Relative location, n (%)
Ipsilateral815 (33.3)
Contralateral1636 (66.7)Diagnostic interval (months), mean (SD)101.0 (47.6)Initial primary lung cancer
Year of diagnosis, n (%)

1988-1995763 (31.1)

1996-2003919 (37.5)

2004-2012769 (31.4)
SEERa stage, n (%)

Localized1538 (62.7)

Regional913(37.3)
Histology, n (%)

ADCb1399 (57.1)

SCCc690 (28.2)

Other NSCLCd 362 (14.8)
Grade, n (%)

Well differentiated277 (11.3)

Moderately differentiated844 (34.3)

Poorly differentiated792 (32.3)

Undifferentiated115 (4.7)

Unknown423 (17.3)
Surgery, n (%)

No surgery264 (10.8)

Sublevel resection295 (12)

Lobectomy1786 (72.9)

Pneumonectomy106 (4.3)
Chemotherapy, n (%)

Yes465 (19)

No/unknown1986 (81)
Radiotherapy, n (%)

Yes489 (20)

No/unknown1962 (80)Second primary lung cancer
Surgery, n (%)

No surgery1603 (65.4)

Wedge resection295 (12)

Segmentectomy61 (2.5)

Other/inseparable sublevel resection87 (3.5)

Lobectomy352 (14.4)

Pneumonectomy53 (2.2)
Chemotherapy, n (%)

Yes694 (28.3)

No/Unknown1757 (71.7)
Radiotherapy, n (%)

Yes953 (38.9)

No/Unknown1498 (61.1)
Treatment, n (%)

Only radiotherapy964 (35.3)

Only surgery759 (31.0)

Surgery + radiotherapy89 (3.6)

None739 (30.2)

aSEER: Surveillance, Epidemiology, and End Results.

bADC: adenocarcinoma.

cSCC: squamous cell carcinoma.

dNSCLC: non-small cell lung cancer.

Radiotherapy Versus Surgery

Before PSM, the distributions of several baseline characteristics were significantly different between the radiotherapy and surgery groups. These included age (P<.001); sex (P=.005); relative location of the 2 primary tumors (P<.001); diagnostic interval (P<.001); and IPLC characteristics such as year of diagnosis (P=.004), histology (P<.001), surgical procedure (P<.001), radiotherapy (P=.04), and chemotherapy for MSPLC (P<.001; ). A shows the survival outcomes among the 4 treatment groups (P<.001). Patients who only received radiotherapy had worse survival than those who underwent surgical resection but better survival than the no treatment group.

To evaluate the role of radiotherapy in terms of treatment for MSPLC, multiple PSM analyses were performed to compare radiotherapy with no treatment, surgery, and surgery plus radiotherapy. After PSM (ratio: 1:1; caliper=0.01), all baseline characteristics were matched well between the corresponding comparison groups ( and Tables S1 and S2 in ). As shown in , the radiotherapy group had significantly better survival outcomes than the no treatment group (P<.001; B) but significantly worse survival outcomes than the surgery group (P<.001; C). However, radiotherapy seemed to not improve the survival outcome among patients who received surgery for MSPLC (P=.26; D).

Table 2. Comparison of baseline characteristics between surgery and radiotherapy for second primary lung cancer before and after propensity score matching (PSM).CharacteristicBefore PSMAfter PSM
Radiation (n=864)Surgery (n=759)P valueRadiation (n=470)Surgery (n=470)P valueAge (years), mean (SD)63.9 (8.9)62.1 (9.0)<.00163.0 (8.8)62.7 (9.1).55Race, n (%).10
.75
White737 (85.3)642 (84.6)
401 (85.3)393 (83.6)

Black85 (9.8)63 (8.3)
39 (8.3)45 (9.6)

Other42 (4.9)54 (7.1)
30 (6.4)32 (6.8)
Sex, n (%).005
.95
Male417 (48.3)313 (41.2)
201 (42.8)203 (43.2)

Female447 (51.7)446 (58.8)
269 (57.2)267 (56.8)
Relative location, n (%)<.001
.73
Ipsilateral321 (37.2)208 (27.4)
152 (32.3)146 (31.1)

Contralateral543 (62.8)551 (72.6)
318 (67.7)324 (68.9)
Diagnostic interval (months), mean (SD)104.4 (48.7)95.8 (45.3)<.00199.1 (43.5)100.9 (50.5).56IPLCa
Year of diagnosis.004
.93

1988-1995242 (28)256 (33.7)
135 (28.7)135 (28.7)


1996-2003313 (36.2)286 (37.7)
174 (37)169 (36)


2004-2012309 (35.8)217 (28.6)
161 (34.3)166 (35.3)

SEERb stage.499
.73

Localized560 (64.8)505 (66.5)
300 (63.8)306 (65.1)


Regional304 (35.2)254 (33.5)
170 (36.2)164 (34.9)

Histology<.001
.82

ADCc451 (52.2)495 (65.2)
275 (58.5)275 (58.5)


SCCd280 (32.4)173 (22.8)
125 (26.6)131 (27.9)


Other NSCLCe133 (15.4)91 (12)
70 (14.9)64 (13.6)

Grade.06
≥.99

Well differentiated82 (9.5)106 (14)
50 (10.6)52 (11.1)


Moderately differentiated295 (34.1)259 (34.1)
167 (35.5)168 (35.7)


Poorly differentiated295 (34.1)231 (30.4)
149 (31.7)148 (31.5)


Undifferentiated42 (4.9)33 (4.3)
22 (4.7)23 (4.9)


Unknown150 (17.4)130 (17.1)
82 (17.4)79 (16.8)

Surgery<.001
.98

No surgery102 (11.8)47 (6.2)
45 (9.6)42 (8.9)


Sublevel resection100 (11.6)105 (13.8)
59 (12.6)61 (13)


Lobectomy616 (71.3)591 (77.9)
353 (75.1)355 (75.5)


Pneumonectomy46 (5.3)16 (2.1)
13 (2.8)12 (2.6)

Chemotherapy.77
.87

Yes155 (17.9)131 (17.3)
91 (19.4)88 (18.7)


No/unknown709 (82.1)628 (82.7)
379 (80.6)382 (81.3)

Radiotherapy.04
.93

Yes176 (20.4)123 (16.2)
86 (18.3)88 (18.7)


No/unknown688 (79.6)636 (83.8)
384 (81.7)382 (81.3)
SPLCf
Chemotherapy<.001
≥.99

Yes318 (36.8)91 (12)
88 (18.7)87 (18.5)


No/unknown546 (63.2)668 (88)
382 (81.3)383 (81.5)

aIPLC: initial primary lung cancer.

bSEER: Surveillance, Epidemiology, and End Results.

cADC: adenocarcinoma.

dSCC: squamous cell carcinoma.

eNSCLC: non-small cell lung cancer.

fSPLC: second primary lung cancer.

Figure 1. (a) Overall survival of 2451 patients with MSPLC between 1988 and 2012 in different treatment groups before propensity score matching (PSM). (b) Overall survival of radio-therapy and none-treatment after PSM. (c) Overall survival of radiotherapy and surgery after PSM. (d) Overall survival of surgery and surgery plus radiotherapy after PSM. Radiotherapy Versus Wedge Resection or Lobectomy

To further compare survival between radiotherapy and specific surgical procedures, patients with MSPLC diagnosed with IPLC after 2004 were selected. Those who underwent unknown or indefinite sublevel resection, segmentectomy (very few patients) and pneumonectomy for MSPLC were excluded. There were 716 patients included for further analyses. The demographic characteristics are described in . Before PSM, A shows that patients who underwent wedge resection or lobectomy had significantly better OS than those who received radiotherapy, and all of them had significantly better OS than the no treatment group. More clinical parameters such as T and N stage for IPLC and tumor size for MSPLC were matched by PSM, and all parameters were matched well (Tables S3-S5 in ). Similarly, after PSM, both wedge resection (P=.004; C) and lobectomy (P=.002; D) had significantly better OS than radiotherapy. Furthermore, radiotherapy also had greater survival benefits than no treatment (P<.001; B).

Table 3. Demographic and clinical characteristics of 716 patients diagnosed with second primary lung cancer after 2004.CharacteristicResultsAge (years), mean (SD)65.8 (9.0)Race, n (%)
White608 (84.9)
Black65 (9.1)
Other43 (6)Sex, n (%)
Male310 (43.3)
Female406 (56.7)Relative location, n (%)
Ipsilateral279 (39)
Contralateral437 (61)Interval, mean (SD)74.2 (21.4)Initial primary lung cancer
T stage, n (%)

T1315 (44)

T2277 (38.7)

T335 (4.9)

T466 (9.2)

Unknown23 (3.2)
N stage, n (%)

N0528 (73.7)

N180 (11.2)

N297 (13.5)

Unknown11 (1.5)
Histology, n (%)

ADCa406 (56.7)

SCCb206 (28.8)

Other NSCLCc104 (14.5)
Grade, n (%)

Well differentiated94 (13.1)

Moderately differentiated275 (38.4)

Poorly differentiated211 (29.5)

Undifferentiated18 (2.5)

Unknown118 (16.5)
Surgery, n (%)

No surgery131 (18.3)

Sublevel resection106 (14.8)

Lobectomy455 (63.5)

Pneumonectomy24 (3.4)
Chemotherapy, n (%)

Yes237 (33.1)

No/unknown479 (66.9)
Radiotherapy, n (%)

Yes174 (24.3)

No/unknown542 (75.7)Second primary lung cancer
Size (cm), n (%)

0-3385 (53.8)

3-571 (9.9)

>556 (7.8)

Unknown204 (28.5)
Surgery, n (%)

No surgery533 (74.4)

Wedge resection102 (14.2)

Lobectomy81 (11.3)
Chemotherapy, n (%)

Yes201 (28.1)

No/unknown515 (71.9)
Radiotherapy, n (%)

Yes309 (43.2)

No/unknown407 (56.8)
Treatment, n (%)

None224 (31.3)

Only radiation309 (43.2)

Only wedge102 (14.2)

Only lobectomy81(11.3)

aADC: adenocarcinoma.

bSCC: squamous cell carcinoma.

cNSCLC: non-small cell lung cancer.

Figure 2. Overall survival of (A) 716 patients with metachronous second primary lung cancer (MSPLC) after 2004 in different treatment groups before propensity score matching (PSM); (B) patients who received radiotherapy or no treatment, after PSM; (C) patients who received radiotherapy or underwent wedge resection, after PSM; (D) patients who received radiotherapy or underwent lobectomy, after PSM. ML-Based Cancer-Specific Death Risk Prediction

Using LASSO regression, we identified 9 variables that made significant contributions to CSS (). These variables encompassed age at diagnosis, sex, year of diagnosis, radiotherapy of IPLC, primary site, histology, surgery, chemotherapy, and radiotherapy of MPSLC. The ML models displayed outstanding performance, as indicated by high AUC values, highlighting the superiority of artificial intelligence in prognostic prediction (). The decision curve analyses are depicted in . Additionally, we assessed the sensitivity and specificity of each ML model using the maximal Youden index, which represents an optimal balance between true positives and true negatives (). Through 5-fold cross-validation, the XGB, RFC, and ADB models demonstrated superior performance. In order to gain deeper insights into the relationships between demographic characteristics and long-term outcomes for MSPLC patients, we used these ML algorithms to develop predictive models to assess the 1-year, 3-year, 5-year, and 10-year risks of cumulative cancer-specific mortality based on the aforementioned variables. Consequently, we calculated the contribution of each variable. Notably, we identified the variables associated with CSS at different time intervals (). Surgery for MPSLC predominantly and substantially influenced 1-year, 3-year, 5-year, and 10-year CSS. Radiotherapy for MPSLC also had an impact on 1-year, 3-year, 5-year, and 10-year survival, but its effect was comparatively less than that of surgery. The primary site and histology of MPSLC affected 1-year, 3-year, and 5-year CSS, but it had no impact on 10-year CSS. Additionally, radiotherapy for IPLC had an impact on 1-year and 3-year CSS but had minimal influence on 5-year and 10-year survival.

Figure 3. Machine learning model using least absolute shrinkage and selection operator (LASSO) regression analysis for risk prediction of cumulative cancer-specific mortality in patients with metachronous second primary lung cancer (MSPLC): (A) 5-fold cross-validation results and (B) model regression coefficient profile. Figure 4. Receiver operating characteristic (ROC) curves for machine learning models for risk prediction of cumulative cancer-specific mortality in patients with metachronous second primary lung cancer (MSPLC): (A) 1-year lymphoma-specific mortality; (B) 3-year lymphoma-specific mortality; (C) 5-year lymphoma-specific mortality; (D) 10-year lymphoma-specific mortality. ADB: adaptive boosting; ANN: artificial neural network; AUC: area under the curve; GBDT: gradient boosting decision tree; KNN: K nearest neighbor; RFC: random forest classifier; ROC: receiver operating characteristic; XGB: extreme gradient boosting. Figure 5. Decision curve analysis for 6 classical machine learning–based models for risk prediction of cumulative cancer-specific mortality in patients with metachronous second primary lung cancer (MSPLC): (A) 1-year lymphoma-specific mortality; (B) 3-year lymphoma-specific mortality; (C) 5-year lymphoma-specific mortality; (D) 10-year lymphoma-specific mortality. Table 4. Performance of machine learning models for risk prediction of long-term cancer-specific survival of patients with second primary lung cancer after 2004.ModelSensitivity, %Specificity, %AUCa (95% CI)1-year cancer-specific survival
XGBb7760.20.73 (0.71-0.75)
RFCc76.7630.74 (0.72-0.76)
ADBd83.154.40.75 (0.73-0.77)
KNNe70.963.60.72 (0.70-0.74)
ANNf88.241.90.74 (0.72-0.76)
GBDTg90.6360.74 (0.72-0.76)3-year cancer-specific survival
XGB69.973.80.77 (0.75-0.79)
RFC75.669.20.77 (0.75-0.79)
ADB79.366.40.76 (0.74-0.78)
KNN79.6640.75 (0.73-0.77)
ANN83.659.90.77 (0.75-0.79)
GBDT84.457.60.75 (0.73-0.77)5-year cancer-specific survival
XGB79.671.30.78 (0.75-0.81)
RFC79.271.50.79 (0.76-0.82)
ADB75.374.70.79 (0.76-0.82)
KNN74.373.90.77 (0.74-0.80)
ANN79.371.50.80 (0.77-0.83)
GBDT80.169.50.78 (0.75-0.81)10-year cancer-specific survival
XGB78.874.70.84 (0.80-0.88)
RFC78.340.70.83 (0.79-0.87)
ADB78.4810.84 (0.80-0.88)
KNN80.773.40.78 (0.72-0.84)
ANN68.888.60.85 (0.81-0.89)
GBDT79.778.50.85 (0.81-0.89)

Comments (0)

No login
gif