The performance of the proposed network is evaluated by using the optimized GAN to generate CT images of COVID-19 and non-COVID-19 patients; experiments are performed to assess the classification performance using different DL networks.
DatasetChest CT images of COVID-19 and non-COVID-19 were collected from the SARS-CoV-2 CT-Scan dataset [29]. The dataset consists of 2482 COVID-19 CT scan images, consisting of 1252 COVID-19 and 1230 non-COVID-19 images. The CT scan images of the dataset were collected from Sao Paulo Hospital, Brazil. The dataset is publicly available at “https://www.kaggle.com/plameneduardo/sarscov2-ctscan-dataset”. The details of the images are shown in Table 3.
Table 3 Details of the datasetImplementation DetailsAll experiments are performed on a computer with a 64 GB RAM Nvidia GPU with a Windows 10 Pro operating system using MATLAB 2020a version. It takes 8 h to train the GAN to generate new CT images on a single GPU. The proposed algorithm is evaluated using the publicly available SARS-CoV-2 CT-Scan dataset for both COVID-19 and non-COVID-19 images.
TrainingThe GAN architecture used for the experiments is illustrated in Table 1, including the generator and discriminator networks. All the true images are resized to 224 × 224 × 3, and the generated images from generator are of the same size. The WOA is used to optimize the hyperparameters of generator to optimize the performance of discriminator.
The Adam training algorithm is used in this work because it is an efficient adaptive stochastic optimization algorithm used in many computer vision and natural language processing applications [30, 31]. Adam’s advantages, such as its computational efficiency, low memory requirements, suitability for large amounts of input data, and appropriateness for noisy inputs, allow it to require little tuning for selecting the hyperparameters [32]. The optimized hyperparameters obtained using the WOA are given in Table 4, and sample training images of COVID-19 and non-COVID-19 patients are provided in Figs. 4 and 5, respectively.
Table 4 Training parameters for optimized GANFig. 3Workflow of the proposed methodology
Fig. 4Sample COVID-19 training images
Additionally, 518 new CT images are generated using GAN for both COVID-19 and non-COVID-19 cases. Seventy percent of the data are used to train the InceptionV3 network, and the remaining 30% of the data are used for testing. The training progress of the optimized CNN is shown in Fig. 6a, and results are shown in Fig. 6b. The proposed optimized InceptionV3 network achieves good accuracy and minimum loss in all epochs.
Fig. 5Sample non-COVID-19 training images
TestingFirst, the images are resized to 224 × 224 × 3 after data augmentation and training the proposed network. After that, the test images are given as input to the trained CNN, where all the parameters of the CLs and FCLs are already optimized. Then, the CNN first extracts the image features and classifies them into the appropriate class using FCL and softmax classifiers. Sample COVID-19 and non-COVID-19 testing images are shown in Figs. 7 and 8, respectively.
Fig. 6Training progress of the InceptionV3 network (a) accuracy and loss (b) results
Fig. 7Sample COVID-19 testing images
Performance Indicators and Evaluation MetricsThe main aim of the present work is to diagnose COVID-19 from queried CT images automatically. Therefore, to analyze the proposed algorithm’s performance, sensitivity, specificity, accuracy, F1-score, positive predictive value (PPV), and negative predictive value (NPV) are used as performance metrics; performance is also assessed based on the confusion matrix and receiver-operating characteristic (ROC) curve of the proposed algorithm. Sensitivity evaluates the ability of the algorithm to correctly identify true positive cases of COVID-19, while specificity evaluates the ability to correctly identify true negative cases. Accuracy evaluates the ability of the classifier to differentiate between COVID-19 and non-COVID-19 cases. PPV evaluates how many true positive cases of COVID-19 were classified as positive out of all true positive and false positive cases. NPV evaluates how many true negative cases of COVID-19 were classified as negative out of all false positive and true negative cases. F1-score provides a single score associated with the precision and recall of the algorithm. The confusion matrix shows a summary of the predictions in tabular form. The ROC curves show the performance of the classifier in graphical form. The accuracy, sensitivity, specificity, F1-score, positive predictive value (PPV), and negative predictive value (NPV) performance metrics equations are represented in Eqs. (6), (7), (8), (9), (10), and (11), respectively where TP indicates the true positive, TN indicates the true negative, FP indicates the false positive, and FN indicates the false negative.
$$\mathrm=\frac+\mathrm}+\mathrm+\mathrm+\mathrm}$$
(6)
$$\mathrm=\frac}+\mathrm}$$
(7)
$$\mathrm=\frac}+\mathrm}$$
(8)
$$\mathrm1\mathrm\hspace=\hspace2\frac\times \mathrm}+\mathrm}$$
(9)
$$\mathrm=\frac}+\mathrm}$$
(10)
$$\mathrm=\frac}+\mathrm}$$
(11)
Experimental ResultsThe proposed network efficiency is validated for the automatic diagnosis of COVID-19 from CT scan images; a comparative analysis is performed for three different scenarios described as follows.
Optimized GANThe proposed network helps diagnose COVID-19 from CT images automatically using DL networks with high sensitivity and specificity. It can therefore help radiologists perform initial screenings for a large population. Additionally, it is a noninvasive method of diagnosis that produces results on the order of seconds. However, the main disadvantage of CT scans is the limited number of images available, which is often not sufficient to train the DL networks. Therefore, a GAN is used to generate new CT images for both COVID-19 and non-COVID-19 cases. Initially, 2482 CT images were available encompassing both COVID-19 and non-COVID-19 images; after using GAN, 518 additional images are generated. Now, a total of 3000 CT images are available to train the DL networks. Additionally, the hyperparameters of the generator network of the GAN are optimized using the WOA to improve the performance of the discriminator. Figure 9 shows the COVID-19 (Fig. 9a) and non-COVID-19 (Fig. 9b) images generated using the optimized GAN. Table 5 shows a comparison of the DL network’s performance with optimized and non-optimized data augmentation, which clearly shows the advantage of the proposed network. Figures 10 and 11 show a performance comparison in terms of ROC curves and the confusion matrixes of the non-optimized and optimized GAN. From Figs. 10 and 11, the optimized GAN produced better results than the non-optimized GAN. The total number of testing images is not equal in Fig. 11 because Fig. 11b shows the confusion matrix of the optimized GAN, which includes a greater number of generated images. On the other hand, Fig. 11a depicts the confusion matrix of the non-optimized GAN, which includes fewer generated images than the optimized GAN.
Fig. 8Sample non-COVID-19 testing images
Table 5 Comparison of performance metrics of optimized and non-optimized GANFig. 9CT images generated using the optimized-GAN (a) COVID-19 and (b) non-COVID-19
Fig. 10Comparison of ROC curves (a) non-optimized GAN and (b) optimized GAN
Performance Analysis with Other DL NetworksThe newly generated CT images are included in the classification. In this work, six standard DL networks are used for image classification: AlexNet, GoogleNet, SqueezeNet, VGG19, ResNet-50, and InceptionV3. Seventy percent of the data are used for training the DL networks, and the remaining 30% of the data are used for testing. A comparison of all the networks is performed using the performance metrics of accuracy, specificity, sensitivity, F1-score, PPV, and NPV and is shown in Table 6. Comparisons in terms of the ROC curves and confusion matrixes are illustrated in Figs. 12 and 13. These comparisons indicate that the optimized GAN-based InceptionV3 gives the best accuracy. Therefore, this network can be used to diagnose COVID-19 in real-time applications.
Table 6 Performance analysis with other DL networksFig. 11Comparison of confusion matrixes (a) non-optimized GAN and (b) optimized GAN
Fig. 12Comparison of the confusion matrixes
Performance Analysis with Other Meta-Heuristics AlgorithmIn this study, five standard optimization techniques in addition to the proposed WOA are used to classify the images: genetic algorithm (GA) [33], pattern search (PS) [34], particle swarm optimization (PSO) [35], simulated annealing (SA) [36], and Grey Wolf Optimization (GWO) [37]. For training the DL networks, 70% of the data are used; the remaining 30% of the data are used for testing. A comparison of all these networks in terms of the accuracy, specificity, sensitivity, F1-score, PPV, and NPV is given in Table 7. The results of these comparisons indicate that the CNN optimized by the WOA provides the best results. Therefore, the proposed network can be reliably used to diagnose COVID-19 in real-time applications.
Table 7 Comparative analysis with other meta-heuristics algorithmComparative AnalysisFor further analysis of the results, the performance metrics of our pretrained deep-learning network, including the accuracy, sensitivity, specificity, precision, and F1-score, are compared with those of other state-of-the-art methods. The acquired performance parameters of our proposed method are better than those of the other approaches, as summarized in Table 8. The main advantage of the proposed approach is that no tuning is required for different databases, which is dissimilar to the model-based methodologies described in [8, 12, 19, 22, 39]. In this way, the proposed approach can successfully deal with any concealed databases with no particular parameter tuning. The authors of [24, 27] and [38] achieved marginally acceptable accuracy, but the sensitivity and specificity values were less than those of the proposed method because of the absence of basic imaging data. Likewise, the accuracy and F1-score of the proposed strategy are contrasted with those of cutting-edge strategies. By and large, the proposed method yields better results than other cutting-edge techniques.
Table 8 Comparison of the results with state of the art DL networks with CT imagesCross-ValidationCross-validation (CV) is an essential tool for predicting network performance by splitting the data k times into training and testing sets. In the present work, to validate the performance of the proposed work over 10 iterations, the value of k is set to 10. Therefore, the CT data for both classes are split into ten subsets. For every iteration, one subset from the k subsets is used for testing, and the remaining k-1 subsets are used for training the network. Then, the error rate is calculated k times for the proposed optimized GAN ResNet-50 network. The cross-validation results are shown in Fig. 14.
Fig. 13Comparison of the ROC curves (a) AlexNet, (b) GoogleNet, (c) VGG19, (d) SqueezeNet, (e) ResNet-50, (f) InceptionV3
Fig. 14DiscussionThis paper’s primary goal was to produce additional CT images to screen COVID-19 to overcome the limitations of a small dataset for training a DL network; a total of 3000 CT images are ultimately used in this study. The parameters used to train the GAN are given in Table 4, and the hyperparameters of the GAN are optimized by using the WOA. Table 5 shows a comparison of the performance results using both optimized and non-optimized GAN. The performance metrics of the proposed method, including accuracy, sensitivity, specificity, F1-score, PPV, and NPV, are compared with those of other pretrained networks, such as AlexNet, GoogleNet, VGG19, SqueezeNet, ResNet-50, and InceptionV3. Accuracy is the essential metric for assessing the overall classification performance. Sensitivity is used to assess the ability to correctly identify COVID-19 cases.
Consequently, it recognizes patients who actually have COVID-19. Specificity assesses the ability to correctly identify non-COVID-19 cases. In this manner, this metric distinguishes patients who do not have COVID-19. Precision is a measure of the proportion of correctly identified COVID-19 cases out of all initially suspected cases. F1-score, PPV, and NPV are essential measures that can provide additional details of the classifications, mainly when the data consist of imbalanced classes. The F1-score computes the weighted harmonic mean of precision and recall. Tables 7 and 8 show an analysis of the performance metrics between the proposed method and other competitive methods. It can be plainly seen that the proposed approach achieves better performance than the other methods.
Comments (0)