Skip to main content

Development and validation of a risk model with variables related to non-small cell lung cancer in patients with pulmonary nodules: a retrospective study



Lung cancer is a major global threat to public health for which a novel predictive nomogram is urgently needed. Non-small cell lung cancer (NSCLC) which accounts for the main port of lung cancer cases is attracting more and more people’s attention.

Patients and methods

Here, we designed a novel predictive nomogram using a design dataset consisting of 515 pulmonary nodules, with external validation being performed using a separate dataset consisting of 140 nodules and a separate dataset consisting of 237 nodules. The selection of significant variables for inclusion in this model was achieved using a least absolute shrinkage and selection operator (LASSO) logistic regression model, after which a corresponding nomogram was developed. C-index values, calibration plots, and decision curve analyses were used to gauge the discrimination, calibration, and clinical utility, respectively, of this predictive model. Validation was then performed with the internal bootstrapping validation and external cohorts.


A predictive nomogram was successfully constructed incorporating hypertension status, plasma fibrinogen levels, blood urea nitrogen (BUN), density, ground-glass opacity (GGO), and pulmonary nodule size as significant variables associated with nodule status. This model exhibited good discriminative ability, with a C-index value of 0.765 (95% CI: 0.722-0.808), and was well-calibrated. In validation analyses, this model yielded C-index values of 0.892 (95% CI: 0.844-0.940) for external cohort and 0.853 (95% CI: 0.807-0.899) for external cohort 2. In the internal bootstrapping validation, C-index value could still reach 0.753. Decision curve analyses supported the clinical value of this predictive nomogram when used at a NSCLC possibility threshold of 18%.


The nomogram constructed in this study, which incorporates hypertension status, plasma fibrinogen levels, BUN, density, GGO status, and pulmonary nodule size, was able to reliably predict NSCLC risk in this Chinese cohort of patients presenting with pulmonary nodules.

Peer Review reports


Lung cancer is a form of malignancy arising due to the unrestrained growth of bronchial and lung cells [1, 2], and it is one of the leading causes of mortality in the world [3]. Rates of lung cancer have been rising rapidly in recent years, particularly in more heavily industrialized nations [4]. Currently, lung cancer patients exhibit 5-year survival rates of approximately 16.6% [5], and roughly 1 million individuals in China are predicted to be diagnosed with lung cancer by the year 2025 such that China exhibits the highest global lung cancer incidence rate. Accounting for the main port of lung cancer, the early treatment of NSCLC is the hot spot of the current research.

Key risk factors associated with lung cancer development include specific genetic mutations, smoking, and environmental exposures such as air pollution. There is also some evidence suggesting that factors such as a poor diet, alcohol intake, estrogen levels, the smoking of marijuana, and infection with human papillomavirus (HPV), human immunodeficiency virus (HIV), and Epstein-Barr virus may increase lung cancer risk, although such evidence remains somewhat inconclusive [6]. Analyses of patient computed tomography (CT) scans often reveal pulmonary nodules, and many models have been developed to gauge the link between such nodules and lung cancer risk, including the Brock model [7] and the Mayo model. These models, however, often do not take epidemiological variables, clinical findings, and CT scan results into consideration at the same time, limiting their value as predictors of the relative risk of a given pulmonary nodule being malignant. The development of more reliable and accurate predictive tools has the potential to enable early intervention and treatment for NSCLC patients, maximizing their odds of positive outcomes. Herein, we analyzed 28 variables that were considered potentially relevant to the diagnosis of a given pulmonary nodule as being benign or malignant based on previous studies [1, 7,8,9].

By analyzing epidemiological, clinical, and CT-related factors for patients with pulmonary nodules that had undergone surgical treatment, we sought to develop a simple but robust predictive model that would enable the relative assessment of NSCLC risk based only upon characteristics that can be readily assessed prior to surgery or other therapeutic interventions.

Materials and methods


The Ethics Committee of the affiliated Lihuili Hospital of Ningbo University, Lihuili hospital approved this study (approval no KY2020PJ141). Enrolled patients for design cohort were individuals from China recruited at the Xingning campus of Lihuili Hospital between October 2020 to February 2022, with the external validation cohort being recruited from April 2022 to June 2022. An external validation cohort 2 was recruited from March 2023 to June 2022 at the Eastern campus of Lihuili Hospital. Eligible patients were individuals that had undergone surgical resection following pulmonary nodule identification. Small cell lung cancer cases were removed to avoid bias for the small sample size. Patients provided written informed consent to participate in this study. Inclusion criteria: (1) pulmonary nodules that were detected through CT scanning; (2) patients who were asymptomatic at time of diagnosis; (3) patients physically able to undergo surgery. Any patients diagnosed with serious cognitive or physical impairments, or other serious diseases were excluded from the study cohort. Data including patient clinical, demographic, and disease-related characteristics were retrieved from patient medical records.

Statistical analysis

Data are given as numbers (percentages), and were analyzed using R (v 4.2.1; and IBM SPSS Statistics 23.0. The LASSO method, which enables the reduction of high-dimensional datasets, was utilized as a means of selecting the optimal predictors associated with NSCLC risk among the included pulmonary nodule patients. Those features that yielded non-zero coefficient values in this LASSO regression analysis were retained for nomogram incorporation. The final predictive model was constructed via a univariate logistic regression analysis followed by a multivariate logistic regression analysis, with all significance levels being two-sided. The design cohort was used to develop the predictive model, with calibration curves being used to assess nomogram calibration. Significant calibration curve results were indicative of a model that was not perfectly calibrated. Model discrimination performance was assessed based on the value of Harrell’s C-index. Validation of this nomogram was additionally performed to calculate an accurate C-index value by internal bootstrapping validation (1,000 bootstrap resamples) and external validation. Decision curve analyses were used to assess the clinical utility of this NSCLC risk nomogram by quantifying the net benefit at different probability thresholds in the cohorts, with the net benefit being calculated by subtracting the proportion of patients with false-positive results from the proportion of patients with true-positive results and by assessing the relative harm of failing to intervene as compared to the potential negative outcomes associated with an unnecessary intervention. Receiver operating characteristic (ROC) curves were also used to assess the precision of this predictive risk model. The net reclassification improvement index (NRI) and integrated discrimination improvement index (IDI) analysis were performed to calculate the improvement of the new model.


Patient characteristics

In total, data from 515 patients with pulmonary nodules that visited our clinic between October 2020 and February 2022 were included in the design cohort for this study, while data from 140 patients collected from April 2022 to June 2022 were designed as external validation cohort and patients from Eastern campus were set as external validation cohort 2. Patients aged 21–86 (mean age: 58.97 ± 12.02 years) in the design cohort were separated into groups with benign nodules and malignant lesions, as well as patients aged 26–78 (mean age: 57.17 ± 11.18 years) in the external cohort and patients aged 22–85 (mean age: 56.52 ± 12.20 years) in the external cohort 2. For details regarding the demographic and clinical characteristics of patients in these groups, see Table 1.

Table 1 Baseline characteristics

Feature selection and predictive model development

In total, 28 potentially relevant features were evaluated for inclusion in a predictive model. Of these features, 14 were ultimately selected through a LASSO regression analysis of the 515 patients in the design cohort (Fig. 1A and B). These features included “border clear”, vessels pass through, hypertension status, smoking history, drinking history, blood glucose, BUN, serum uric acid (SUA), triglyceride (TG) levels, plasma fibrinogen levels, density, ground-glass opacities (GGOs), spicule sign, and pulmonary nodule size. Then, in Table 2, univariate and multivariate logistic regression analyses were performed. The P-value of 0.624 in the Hosmer–Lemeshow test indicated non-significance. CT characteristics and correlative pathological results of representative nodules were shown in Fig. 2. A predictive model incorporating these significant variables was developed using the design cohort (Fig. 3). Nodule density was defined as being “low” when it exhibited a CT value that was higher than that of pulmonary tissue but lower than that of pulmonary vessels, “intermediate” for nodules with solid and GGO components, and “high” when CT values were greater for the nodule than for pulmonary vessels. While the features of Mayo model were smoking history, age, nodule diameter, cancer history, site in the left and spicule sign. This model was explained by the calculation formula: P = ex/(1 + ex), where x =  − 6.8272 + (0.7917 × smoking history) + (0.0391 × age) + (0.1274 × nodule diameter) + (1.3388 × cancer history) + (0.7838 × the upper lobe) + (1.0407 × spicule sign). One recent research [10] which was published in Chest showed that the parsimonious Brock model (including gender, size, upper location and spicule sign) could predict cancer risk well, and we calculated the performance of the model in our cohorts.

Fig. 1
figure 1

LASSO binary logistic regression model-based clinicopathological feature selection. Notes: A Five-fold cross-validation was used for optimal parameter (λ) selection in the LASSO model via minimum criteria, with a partial likelihood deviance (binomial deviance) curve being plotted against log(λ). Optimal values were marked with dashed vertical lines at optimal values using minimum criteria with the 1-SE criteria. The selected optimal λ value was 0.021. B LASSO coefficient profiles for 28 potential features were generated, with coefficient profile plots against the log(lambda) sequence being generated. Five-fold validation was used to draw vertical lines at selected values, with optimal lambda results yielding fourteen total features with non-zero coefficient values. SE: standard error

Table 2 The features for patients with pulmonary nodules in the design cohort using multivariate logistic regression analyses
Fig. 2
figure 2

Different lung nodules represent their characteristics in CT scan (A) and correlative pathology stained by hematoxylin & eosin (B). Row 1: the pathological finding of a 16-mm high-density nodule for a 52-year-old woman with is benign (X 100); Row 2: the pathological finding of a 5.7-mm low-density nodule for a 34-year-old woman was a carcinoma in situ (X 200); Row 3: the pathological finding of a 8-mm low-density nodule for a 56-year-old man was a MIA (X 100); Row 4: the pathological finding of an 11-mm low-density nodule for a 50-year-old woman was an IAC (X 100); Row 5: the pathological finding of a 7-mm partly solid-density nodule for a 63-year-old woman was a MIA (X 100); Row 6: the pathological finding of an 11-mm partly solid-density nodule for a 81-year-old woman was an IAC (X 100)

Fig. 3
figure 3

NSCLC risk nomogram. Note: An initial design cohort was used to develop this nomogram, which incorporated hypertension, BUN, plasma fibrinogen, pulmonary nodule size, GGO status, density. BUN: blood urea nitrogen. GGO: ground-glass opacity

We classified GGOs as pure GGOs (pGGOs, n = 208) and mixed GGOs (mGGOs, n = 81). The relationship between them and lung cancers were further analyzed. P-value of 0.460 was got in univariate analysis and NSCLC was excluded in the forward likelihood ratio logistic analyses (Table 3). Further, mGGOs was positively correlated with nodule size when compared with pGGOs.

Table 3 Variables significance analyses in the two cohorts using multivariate logistic regression by forward stepwise likelihood ratio way

Assessment of predictive risk model performance

Calibration curves for this predictive nomogram when used to analyze the design cohort revealed it to be well-calibrated, with a C-index value of 0.765 (95% CI: 0.722–0.808) (Fig. 4A). Similarly, the C-index values for the external validation cohort, external validation cohort 2 and internal bootstrapping validation were 0.892 (95% CI: 0.844–0.940) (Fig. 4B), 0.853 (95% CI: 0.807–0.899) (Fig. 4C) and 0.753, respectively, consistent with the discriminative value of this model, suggesting that it exhibits good predictive utility.

Fig. 4
figure 4

Calibration curves for NSCLC nomogram predictions in the design cohort (A), external validation cohort (B) and external validation cohort 2 (C). Note: Predicted risk of NSCLC and actual NSCLC diagnoses are shown on the x-axis and y-axis, respectively, with the dotted line corresponding to a diagnostic model with perfect predictive accuracy and the solid line corresponding to actual nomogram performance. The closer these lines are to one another, the better the predictive performance of this nomogram

Ten-fold cross-validation analyses

Ten-fold cross-validation analyses were performed in the two cohorts (Table 4). As the sample size of validation cohort (Tables 5 and 6) was small, we resampled the cohort at 50 times. Our model showed good stabilities for its well kappa results. The model predicted accurately for the good AUC and Accuracy value (design cohort [AUC = 0.747 ± 0.081; Accuracy = 0.732 ± 0.064; Kappa = 0.376 ± 0.155] vs. validation cohort [AUC = 0.849 ± 0.104; Accuracy = 0.761 ± 0.092; Kappa = 0.321 ± 0.259] vs. validation cohort 2 [AUC = 0.833 ± 0.079; Accuracy = 0.737 ± 0.087; Kappa = 0.461 ± 0.179]).

Table 4 Ten-fold cross-validation analysis for the design cohort
Table 5 Ten-fold cross-validation analysis for the validation cohort
Table 6 Ten-fold cross-validation of the model in the validation cohort 2

Different types of NSCLC compared with normal cases by multinomial logistic analyses

We classified NSCLC into carcinoma in situ, minimally invasive adenocarcinoma (MIA), invasive adenocarcinoma (IAC) and other types according to pathological results. MIA and IAC accounted for 80.32%, and the rest types of NSCLC accounted for only 19.68%. Among them, for example, the degrees of invasion between minimally invasive adenocarcinoma (MIA) and invasive adenocarcinoma (IAC) are incremental. Therefore, the assessment of each factor in the model among the various types of NSCLC was necessary. We excepted carcinoma in situ for the small sample size and mainly concentrated on evaluating the associations with MIA and IAC in different factors using binomial and multinomial logistic regression analyses. The features in the model were related with them (Table 7): GGO (odds ratio [OR] 7.13 [95% CI, 3.25–19.63] and 3.03 [95% CI, 1.47–6.25]) in MIA and IAC, density (OR 5.79 [95% CI, 2.46–13.65] and 2.53 [95% CI, 1.28–5.02]) in MIA and IAC, and nodule size (OR 2.58 [95% CI, 1.20–5.54]; 6.51 [95% CI, 2.71- 15.61] and 5.98 [95% CI, 1.38–25.83]) in MIA, IAC and other types. Risk of MIA and IAC in intermediate-density lung nodules was of significance when compared with high-density nodules, moreover, the risk of IAC was only a half of that of IAC. The analysis of GGO shown a similar trend. Sizes of nodules in different types NSCLC were all significant (P:0.015 vs. 0.000 vs. 0.017), what’s more, when the pulmonary nodule size was ≥ 8 mm the degree of infiltration might be deeper in MIA and IAC. Except for the above features, IAC and other types in the multinomial models had a same risk factor hypertension, while BUN and plasma fibrinogen levels seemed to be risk factors of MIA.

Table 7 Odds ratios of model variables in different types of NSCLC compared with normal cases using binomial and multinomial logistic regression analyses

Analysis of model clinical utility

Decision curve analyses for this predictive nomogram were next performed (Fig. 5). These analyses revealed that at a threshold probability of a patient and a doctor is > 18 and < 90% and > 3% in the two cohorts, respectively, then this nomogram exhibits value as a means of predicting NSCLC risk. Net benefit was comparable with some overlap within this range when assessing NSCLC risk based on this nomogram. Our model (the blue line) showed a higher overall net benefit (Fig. 5A, B, C) when compared with the Mayo models (the red line) and simplified Brock model (the green line) in the two cohorts.

Fig. 5
figure 5

Decision curve analysis. Notes: Net benefit is shown on the y-axis, with the blue or red line corresponding to the NSCLC risk nomogram. The thin and thick lines respectively correspond to the assumptions that all patients or no patients got NSCLC, with the decision curve demonstrating that if the threshold probability of a patient and a doctor is > 18% and < 90% (A), > 3% (B) and > 7% and < 90% (C) in our model for the three cohorts, respectively, then the use of this nomogram to predict the risk of NSCLC is more beneficial than a treat-all or treat-none interventional scheme for these patients. The red line stands for Mayo model, the blue line stands for our model and the green line stands for parsimonious version of the Brock model

ROC curve analysis

ROC curve analyses of the two cohorts included in this study confirmed the predictive value of the two model, with an area under the curve value of 0.765 vs. 0.548 vs. 0.565 for the design cohort (Fig. 6A) and 0.892 vs. 0.741 vs. 0.672 for the external validation cohort (Fig. 6B) and 0.853 vs. 0.715 vs. 0.728 for the external validation cohort (Fig. 6C). The adopting the area under the ROC curve (AUC) values of our model (the blue line) were all higher than that of Mayo model (the red line) and parsimonious version of the Brock model (the green line).

Fig. 6
figure 6

Receiver operating characteristic curve analyses for the design cohort (A), external validation cohort (B) and external validation cohort 2 (C). The red line stands for Mayo model, the blue line stands for our model and the green line stands for parsimonious version of the Brock model

NRI and IDI analysis of the three models

As a supplement for the comparison of the AUC values, we calculated the net reclassification improvement index (NRI) and integrated discrimination improvement index (IDI) of the two models to research the improvement of our model (Table 8). When compared our model with the two models in design cohort, the NRI and IDI were [Mayo: 37.41 (95%CI: 0.29–0.46, P = 0.000) and Brock:32.49 (95%CI: 0.23–0.42, P = 0.000)] and [Mayo: 18.53 (95%CI: 0.15–0.22, P = 0.000) and Brock:17.49(95%CI: 0.14–0.21, P = 0.000)], respectively. In the external validation cohort, the NRI and IDI were [Mayo:34.15 (95%CI: 0.15–0.53, P = 0.000) and Brock:25.23(95%CI: 0.07–0.43, P = 0.006)] and [Mayo:26.86 (95%CI: 0.18–0.35, P = 0.000) and Brock:32.25(95%CI: 0.24–0.41, P = 0.000)], respectively. In the external validation cohort2, the NRI and IDI were [Mayo: 20.28 (95%CI: 0.03–0.38, P = 0.021) and Brock:24.08(95%CI: 0.08–0.40, P = 0.003)] and [Mayo: 19.61 (95%CI: 0.13–0.26, P = 0.000) and Brock:19.72(95%CI: 0.13–0.26, P = 0.000)], respectively. All the P values of them were of significance, meaning that our model could identify the benign and malignant nodules more accurately.

Table 8 The analysis of NRI and IDI for the design cohort, external cohort and external cohort 2 were used to assess reclassification performance and improvement in discrimination of our model


Nomograms are valuable predictive tools that have been widely utilized in oncology and other clinical and research fields, offering a user-friendly approach to intuitively assessing the odds of a given diagnosis or outcome based on a set of specific variables, thereby aiding in clinical decision-making [11]. Many models for the treatment of pulmonary nodules were established based upon certain epidemiological variables and CT scan results. However, clinical findings such as hematological biomarkers are also very important for the diagnosis of lung cancer [1]. Moreover, for some of these variables, such as GGO, the surgical criteria are not well defined such that treatments are often conducted according to the experience of the operating surgeons [12, 13]. As such, we herein sought to develop a new nomogram capable of predicting the relative risk of malignancy when evaluating patients with pulmonary nodules.

We designed and validated a novel predictive model capable of assessing the risk of a given lung nodule being benign or malignant based on analysis of data from patients that had undergone pulmonary nodule resection. The resultant model incorporated demographic, disease-, and treatment-related features to easily predict the odds of a given pulmonary nodule corresponding to a NSCLC diagnosis. The model developed herein was accurate, and exhibited good calibration and discrimination in our validation cohort. The C-index value in this validation cohort was also high, indicating that the nomogram can be accurately used to gauge patient risk of pulmonary nodule malignancy.

Prior studies have confirmed that hypertension is a common comorbidity in cancer patients [14]. Several mechanisms may explain this observation, including the fact that hypertension can increase VEGF levels in the plasma [15]. We identified hypertension as a risk factor for lung nodule malignancy. Fibrinogen has also been significantly linked to the risk of lung cancer in the past [16], with Kuang et al. having demonstrated that a combination of the beta and gamma chains of fibrinogen may offer value as a sensitive biomarker for differentiating between lung nodules that are benign and malignant [17], potentially explaining the significance of plasma fibrinogen levels in our model. One research indicated that the value of BUN to seralbumin ratio might predict patients with serious pulmonary cancer [18]. BUN had a positive relationship with pulmonary tumor risk and was included in risk prediction model therefore [19]. Some researches demonstrated that the maximum diameter of nodules > 8 mm was independent risk factors for malignancy [20] and presence of solid element in the GGO nodules might cause lymph node metastasis [21]. GGO findings have been reported to be associated with cancer rates as high as 63%, with many surgeons believing that GGO nodules should be resected, particularly if they grow in size. Persistent GGO nodules may be indicative of a greater risk of malignancy when solid components are evident [12]. Tu et al. found CT density to be a valuable feature when differentiating between nodules that were malignant and benign [22]. Qiu et al. further determined that solitary ground-glass opacity nodule size and density upon high-resolution T evaluation were associated with invasive adenocarcinoma risk [23]. Nodule size may be the most important variable included in our predictive model, given that nodule diameter is a key determinant of treatment under the British Thoracic Society guidelines [24] and Fleischner Society Guidelines [25]. For nodules ≥ 10 mm in diameter, the odds of malignancy in the NELSON screening study were 15.2% [26]. As such, we included nodule diameter as the size variable in the present study. As the comparison of the AUC value between different models had certain limitations, we calculated the NRI and IDI of the two models to explain the improvement of our model.

Herein, we thus designed a risk nomogram that may aid clinicians in differentiating between patients with benign or malignant lung nodules. It may also aid in the optimal selection of pulmonary nodules in the context of clinical research. For example, this model might be used to aid investigators in selecting patients with larger nodules and other risk-related findings when identifying candidates for surgical procedures or other interventions. Early interventions including CT scans, biochemical analyses of blood samples, and family support can better benefit low-risk patients, while regular clinical examination can ensure the appropriate monitoring of lung nodules to better guide the appropriate assessment of patient diagnosis.

Previous classical models based on large-scale screening experiments have been widely used for clinical evaluation. However, people who go to different hospitals for treatment are inevitably screened by human factors. For example, as a tertiary hospital, our hospital serves for many patients come from subordinate hospitals, which may express the high proportion of patients with ≥ 8 mm and malignant nodules in our cohorts. Therefore, it is necessary to develop clinical assessment models for pulmonary nodules based on different groups of patients. Accurate predictive evaluation can aid surgeons in predicting lung cancer risk in individual patients, ensuring timely intervention for high-risk patients while reducing the need for interventional treatment in low-risk patients. Accurately predicting the risk of lung cancer in a given patient is very challenging, and appropriate measurements together with multifaceted interventional approaches are thus the most reliable approach to detecting and evaluating patients with pulmonary nodules. Further research on this topic is warranted as the accurate detection of pulmonary nodules alone is necessary but insufficient for treating affected patients, underscoring directions for future study.

Although our model showed good accuracy and stability in different validation cohorts. Among the variables included in the model, BUN demonstrated statistical significance solely within the training cohort, while it did not exhibit significance in both the external validation cohort and external validation cohort 2. This suggests potential instability of this index and highlights room for improvement within the model. The presence of these findings indicates that there is still scope for enhancing the current study's model, which is currently limited by its inclusion of a restricted number of variables. With the continuous advancement of artificial intelligence technology, we believe that future research endeavors will benefit from larger training cohorts encompassing more diverse variables, thereby facilitating the establishment of more precise and straightforward prediction models.


There are multiple limitations to this study. For one, all patients in our study were enrolled from a single center over a relatively limited study period. Additionally, risk factor analyses did not incorporate all possible risk factors that may be relevant to the differentiation between benign and malignant nodules. Other relevant factors not included in this analysis included the number of nodules and specific comorbidity incidence rates. In addition, the selection of variables made by taking previous studies into account and the patients were from a tertiary referral center, potentially contributing to significant bias affecting these statistical analyses. Also, the comparison of the AUC value between different models had certain limitations. Lastly, while a bootstrap testing approach was used to validate our nomogram, the patients used for this validation approach may not be sufficient to ensure the generalizability of these data to patients from other countries or regions. As such, further external validation in a wider pulmonary nodule patient population will be essential in the future.


In summary, we herein designed a novel nomogram with good accuracy that offers value as a means of differentiating between benign and malignant pulmonary nodules, enabling clinicians to better plan patient treatment. Such individualized risk analyses offer clinicians an opportunity to appropriately monitor and treat patients. However, further work will be needed to validate this nomogram in larger patient populations and to establish whether the treatment decisions made based on this nomogram will reduce rates of incorrect diagnosis and treatment planning for patients with pulmonary nodules.

Availability of data and materials

All data are fully available from the corresponding author upon reasonable request.



Least absolute shrinkage and selection operator


Non-small cell lung cancer


Blood urea nitrogen


Ground-glass opacity


Human papillomavirus


Human immunodeficiency virus


Computed tomography


Receiver operating characteristic


Adopting the area under the ROC curve


Net reclassification improvement index


Integrated discrimination improvement index


Vascular endothelial growth factor


Standard error


  1. Duffy MJ, O’Byrne K. Tissue and blood biomarkers in lung cancer: a review. Adv Clin Chem. 2018;86:1–21.

    Article  CAS  PubMed  Google Scholar 

  2. Chapman AM, Sun KY, Ruestow P, Cowan DM, Madl AK. Lung cancer mutation profile of EGFR, ALK, and KRAS: meta-analysis and comparison of never and ever smokers. Lung Cancer. 2016;102:122–34.

    Article  PubMed  Google Scholar 

  3. Hong QY, Wu GM, Qian GS, et al. Prevention and management of lung cancer in China. Cancer. 2015;121:3080–8.

    Article  PubMed  Google Scholar 

  4. Pakzad R, Mohammadian-Hafshejani A, Ghoncheh M, Pakzad I, Salehiniya H. The incidence and mortality of lung cancer and their relationship to development in Asia. Transl Lung Cancer Res. 2015;4:763–74.

    PubMed  PubMed Central  Google Scholar 

  5. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin. 2018;68:7–30.

    Article  PubMed  Google Scholar 

  6. Akhtar N, Bansal JG. Risk factors of lung cancer in nonsmoker. Curr Probl Cancer. 2017;41(5):328–39.

    Article  PubMed  Google Scholar 

  7. Tammemägi MC, Katki HA, Hocking WG, et al. Selection criteria for lung-cancer screening. N Engl J Med. 2013;368(8):728–36.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Zuber V, Marconett CN, Shi J, et al. Pleiotropic analysis of lung cancer and blood triglycerides. J Natl Cancer Inst. 2016;108(12):djw167.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Argirion I, Weinstein SJ, Männistö S, Albanes D, Mondul AM. Serum insulin, glucose, indices of insulin resistance, and risk of lung cancer. Cancer Epidemiol Biomarkers Prev. 2017;26(10):1519–24.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Vachani A, Zheng C, Amy Liu IL, Huang BZ, Osuji TA, Gould MK. The probability of lung cancer in patients with incidentally detected pulmonary nodules: clinical characteristics and accuracy of prediction models. Chest. 2022;161(2):562–71.

    Article  PubMed  Google Scholar 

  11. Wei L, Champman S, Li X, et al. Beliefs about medicines and non-adherence in patients with stroke, diabetes mellitus and rheumatoid arthritis: a cross-sectional study in China. BMJ Open. 2017;7(10):e017293.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Migliore M, Fornito M, Palazzolo M, Criscione A, Gangemi M, Borrata F, Vigneri P, Nardini M, Dunning J. Ground glass opacities management in the lung cancer screening era. Ann Transl Med. 2018;6(5):90.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Lococo F, Cusumano G, Cardillo G, SICT PNR-Working Group. It’s unnecessary to perform N1–N2 sampling/dissection in predominantly-GGO cStage-I lung cancer? Ann Thorac Surg. 2021;111(4):1405–6.

    Article  PubMed  Google Scholar 

  14. Wong BS, Chiu LY, Tu DG, Sheu GT, Chan TT. Anticancer effects of antihypertensive L-type calcium channel blockers on chemoresistant lung cancer cells via autophagy and apoptosis. Cancer Manag Res. 2020;13(12):1913–27.

    Article  Google Scholar 

  15. Yang P, Deng W, Han Y, et al. Analysis of the correlation among hypertension, the intake of β-blockers, and overall survival outcome in patients undergoing chemoradiotherapy with inoperable stage III non-small cell lung cancer. Am J Cancer Res. 2017;7(4):946–54.

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Grafetstätter M, Hüsing A, González Maldonado S, et al. Plasma fibrinogen and sP-selectin are associated with the risk of lung cancer in a prospective study. Cancer Epidemiol Biomarkers Prev. 2019;28(7):1221–7.

    Article  PubMed  Google Scholar 

  17. Kuang M, Peng Y, Tao X, Zhou Z, Mao H, Zhuge L, Sun Y, Zhang H. FGB and FGG derived from plasma exosomes as potential biomarkers to distinguish benign from malignant pulmonary nodules. Clin Exp Med. 2019;19(4):557–64.

    Article  CAS  PubMed  Google Scholar 

  18. Peng X, Huang Y, Fu H, Zhang Z, He A, Luo R. Prognostic Value of Blood Urea Nitrogen to Serum Albumin Ratio in Intensive Care Unit Patients with Lung Cancer. Int J Gen Med. 2021;28(14):7349–59.

    Article  Google Scholar 

  19. Chang HT, Wang PH, Chen WF, Lin CJ. Risk assessment of early lung cancer with LDCT and health examinations. Int J Environ Res Public Health. 2022;19(8):4633.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Heo EY, Lee KW, Jheon S, Lee JH, Lee CT, Yoon HI. Surgical resection of highly suspicious pulmonary nodules without a tissue diagnosis. Jpn J Clin Oncol. 2011;41(8):1017–22. Epub 2011 Jun 21.

    Article  PubMed  Google Scholar 

  21. Matsuguma H, Yokoi K, Anraku M, et al. Proportion of ground-glass opacity on high-resolution computed tomography in clinical T1 N0 M0 adenocarcinoma of the lung: a predictor of lymph node metastasis. J Thorac Cardiovasc Surg. 2002;124(2):278–84.

    Article  PubMed  Google Scholar 

  22. Tu SJ, Wang CW, Pan KT, Wu YC, Wu CT. Localized thin-section CT with radiomics feature extraction and machine learning to classify early-detected pulmonary nodules from lung cancer screening. Phys Med Biol. 2018;63(6):065005.

    Article  PubMed  Google Scholar 

  23. Qiu ZX, Cheng Y, Liu D, et al. Clinical, pathological, and radiological characteristics of solitary ground-glass opacity lung nodules on high-resolution computed tomography. Ther Clin Risk Manag. 2016;20(12):1445–53.

    Article  Google Scholar 

  24. Callister ME, Baldwin DR, Akram AR, et al. British Thoracic Society guidelines for the investigation and management of pulmonary nodules. Thorax. 2015;70(Suppl 2):ii1-54.

    Article  PubMed  Google Scholar 

  25. Bueno J, Landeras L, Chung JH. Updated Fleischner society guidelines for managing incidental pulmonary nodules: common questions and challenging scenarios. Radiographics. 2018;38(5):1337–50.

    Article  PubMed  Google Scholar 

  26. Horeweg N, van Rosmalen J, Heuvelmans MA, et al. Lung cancer probability in patients with CT-detected pulmonary nodules: a prespecified analysis of data from the NELSON trial of low-dose CT screening. Lancet Oncol. 2014;15:1332–41.

    Article  PubMed  Google Scholar 

Download references


Special thanks to all participants who took part in this study.


This work was subsidized by Natural Science Foundation of Ningbo Municipality (202003N4269, 2019C50069), the grants of basic public welfare projects in Zhejiang province (LGF19H020004), Zhejiang Province Medical and Health Project (2017ZD026, 2020KY273), Ningbo Health Branding Subject Fund (PPXK2018-01), Ningbo “Technology Innovation 2025” Major Special Project (No. 2022Z150), Major key project of Ningbo medical and health team (2022030107).

Author information

Authors and Affiliations



Z.F.L. and G.F.S. were responsible for study design; R.J.Z. and N.L. performed data acquisition; Z.F.L. and G.F.S. wrote the manuscript.

Corresponding author

Correspondence to Guofeng Shao.

Ethics declarations

Ethics approval and consent to participate

The Ethics Committee of the affiliated Lihuili Hospital of Ningbo University, Lihuili hospital approved this study (approval no KY2020PJ141). Patients provided written informed consent to participate in this study. All methods were carried out in accordance with declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, Z., Zheng, R., Li, N. et al. Development and validation of a risk model with variables related to non-small cell lung cancer in patients with pulmonary nodules: a retrospective study. BMC Cancer 23, 872 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Pulmonary nodules
  • Logistic
  • Variables
  • Model