J. Imaging, Vol. 8, Pages 320: How Well Do Self-Supervised Models Transfer to Medical Imaging?

Conceptualization, J.A., L.C., M.F.C., M.O., W.H.T. and V.C.; methodology, J.A., L.C., M.F.C., M.O., W.H.T. and V.C.; software, J.A., L.C., M.F.C., M.O., W.H.T. and V.C.; validation, J.A., L.C., M.F.C., M.O., W.H.T. and V.C.; formal analysis, J.A., L.C., M.F.C., M.O., W.H.T. and V.C.; investigation, J.A., L.C., M.F.C., M.O., W.H.T. and V.C.; resources, J.A., L.C., M.F.C., M.O., W.H.T. and V.C.; data curation, J.A., L.C., M.F.C., M.O., W.H.T. and V.C.; writing—original draft preparation, J.A., L.C., M.F.C., M.O., W.H.T. and V.C.; writing—review and editing, J.A., L.C., M.F.C., M.O., W.H.T., V.C. and R.W.; visualization, J.A., L.C., M.F.C., M.O., W.H.T. and V.C.; supervision, P.S., R.W. and K.K.; project administration, P.S., R.W. and K.K.; All authors have read and agreed to the published version of the manuscript.

Figure 1. Example images from the (a) CheXpert, (b) ChestX-ray14, (c) Shenzhen-CXR, (d) EyePACS, (e) BACH datasets.

Figure 1. Example images from the (a) CheXpert, (b) ChestX-ray14, (c) Shenzhen-CXR, (d) EyePACS, (e) BACH datasets.

Jimaging 08 00320 g001

Figure 2. Bar chart of model transfer performance on different downstream tasks. SSL (blue) refers to the average over the self-supervised models (SimCLR, MoCo, SwAV, BYOL, PIRL) and Supervised (orange) refers to the average of the supervised models (ResNet-50, ResNet-18, DenseNet-121). For each downstream task, all results are scaled between 0 and 1 (across the SSL and Supervised models - not including the domain-specific models).

Figure 2. Bar chart of model transfer performance on different downstream tasks. SSL (blue) refers to the average over the self-supervised models (SimCLR, MoCo, SwAV, BYOL, PIRL) and Supervised (orange) refers to the average of the supervised models (ResNet-50, ResNet-18, DenseNet-121). For each downstream task, all results are scaled between 0 and 1 (across the SSL and Supervised models - not including the domain-specific models).

Jimaging 08 00320 g002

Figure 3. Box-and-whisker plot of model type (SSL models pretrained on ImageNet) against performance for few-shot recognition. Performance values are scaled between 0 and 1 for each dataset across all models. The performance on each dataset is plotted as a dot for each model.

Figure 3. Box-and-whisker plot of model type (SSL models pretrained on ImageNet) against performance for few-shot recognition. Performance values are scaled between 0 and 1 for each dataset across all models. The performance on each dataset is plotted as a dot for each model.

Jimaging 08 00320 g003

Figure 4. Performance (few-shot, linear, and finetune) on in-domain and out-of-domain datasets for the different model types, Self-supervised learning (pretrained on ImageNet), Supervised and MIMIC-CheXpert. In-domain datasets are comprised of CheXpert, ChestX, Montgomery and Shenzhen, while out-of-domain datasets are comprised of BACH, EyePACS, iChallenge-AMD, iChallenge-PM. Results are scaled between 0 and 1 (across all models), averaged over each model type, and then averaged over each dataset. Error bars correspond to 1σ variations across the individual models that are averaged over.

Figure 4. Performance (few-shot, linear, and finetune) on in-domain and out-of-domain datasets for the different model types, Self-supervised learning (pretrained on ImageNet), Supervised and MIMIC-CheXpert. In-domain datasets are comprised of CheXpert, ChestX, Montgomery and Shenzhen, while out-of-domain datasets are comprised of BACH, EyePACS, iChallenge-AMD, iChallenge-PM. Results are scaled between 0 and 1 (across all models), averaged over each model type, and then averaged over each dataset. Error bars correspond to 1σ variations across the individual models that are averaged over.

Jimaging 08 00320 g004

Figure 5. Bar chart of model transfer performance on different downstream tasks for the MIMIC-CheXpert (green) and MoCo-CXR (red) models. Results are scaled between 0 and 1. All in-domain datasets are located to the left of the dotted line.

Figure 5. Bar chart of model transfer performance on different downstream tasks for the MIMIC-CheXpert (green) and MoCo-CXR (red) models. Results are scaled between 0 and 1. All in-domain datasets are located to the left of the dotted line.

Jimaging 08 00320 g005

Figure 6. Deep image prior reconstructions for an image from the CheXpert dataset (left) for PIRL (centre) and Supervised DenseNet-121 (right). The original image is shown on the left for comparison.

Figure 6. Deep image prior reconstructions for an image from the CheXpert dataset (left) for PIRL (centre) and Supervised DenseNet-121 (right). The original image is shown on the left for comparison.

Jimaging 08 00320 g006

Figure 7. Few-shot accuracy on the CheXpert dataset plotted against attentive diffusion values for the reconstructed CheXpert image for all models. Shown overlaid are the Pearson’s r correlation coefficient and associated p value. A negative Pearson’s r close to −1 (with a low associated p value close to 0) implies a strong negative correlation.

Figure 7. Few-shot accuracy on the CheXpert dataset plotted against attentive diffusion values for the reconstructed CheXpert image for all models. Shown overlaid are the Pearson’s r correlation coefficient and associated p value. A negative Pearson’s r close to −1 (with a low associated p value close to 0) implies a strong negative correlation.

Jimaging 08 00320 g007

Figure 8. Saliency map for an image from the CheXpert dataset for the MoCo-CXR (left), BYOL (centre) and Supervised ResNet-50 (right) models. The three saliency maps have attentive diffusion values 0.33 (MoCo-CXR), 0.50 (BYOL), and 0.47 (Supervised r50).

Figure 8. Saliency map for an image from the CheXpert dataset for the MoCo-CXR (left), BYOL (centre) and Supervised ResNet-50 (right) models. The three saliency maps have attentive diffusion values 0.33 (MoCo-CXR), 0.50 (BYOL), and 0.47 (Supervised r50).

Jimaging 08 00320 g008

Figure 9. Box-and-whisker plot of model type (SSL, Supervised, and Domain-Specific SSL (MIMIC-CheXpert and MoCo-CXR)) against attentive diffusion values, split between in-domain (CheXpert, ChestX, Montgomery, Shenzhen) and out-of-domain (BACH, EyePACS, iChallenge-AMD, iChallenge-PM) datasets.

Figure 9. Box-and-whisker plot of model type (SSL, Supervised, and Domain-Specific SSL (MIMIC-CheXpert and MoCo-CXR)) against attentive diffusion values, split between in-domain (CheXpert, ChestX, Montgomery, Shenzhen) and out-of-domain (BACH, EyePACS, iChallenge-AMD, iChallenge-PM) datasets.

Jimaging 08 00320 g009 Table 1. The type, number of images, and number of classes for each medical dataset considered. Type CXR corresponds to Chest X-rays and BHM to Breast Histology Microscopy slides. Note that here we provide the total number of classes available for the datasets. This does not mean, however, that all class labels are used. For example, CheXpert is treated as Pleural Effusion many-to-one. More details can be found in Section 4.2. Table 1. The type, number of images, and number of classes for each medical dataset considered. Type CXR corresponds to Chest X-rays and BHM to Breast Histology Microscopy slides. Note that here we provide the total number of classes available for the datasets. This does not mean, however, that all class labels are used. For example, CheXpert is treated as Pleural Effusion many-to-one. More details can be found in Section 4.2. DatasetType# Images# ClassesCheXpertCXR224,31614Shenzhen-CXRCXR6622Montgomery-CXRCXR1382ChestX-ray14CXR112,12014BACHBHM4004EyePACSFundus35,1265iChallenge-AMDFundus4002iChallenge-PMFundus4002

Table 2. Summary of terminology used for datasets and models.

Table 2. Summary of terminology used for datasets and models.

ModelsSupervisedSupervised pretrained on ImageNet
(ResNet-50, ResNet-18, DenseNet-121)Self-supervisedSelf-Supervised pretrained on ImageNet
(SimCLR-v1, MoCo-v2, PIRL, SwAV, BYOL)Domain-specific
Self-SupervisedSelf-Supervised pretrained on chest X-rays (MIMIC-CheXpert, MoCo-CXR) DatasetsIn-domainAll chest X-ray datasets
(CheXpert, Shenzhen-CXR Montgomery-CXR, ChestX-ray14)Out-of-domainAll non chest X-ray datasets (BACH, EyePACS, iChallenge-AMD, iChallenge-PM)

Table 3. Few-shot shot transfer performance of the pretrained models on the different medical datasets. All are evaluated as 2-way 20-shot, except ChestX and EyePACS, which are 5-way 20-shot. Results are reported as average accuracy over 600 episodes with 95% CI. Key: best, second best.

Table 3. Few-shot shot transfer performance of the pretrained models on the different medical datasets. All are evaluated as 2-way 20-shot, except ChestX and EyePACS, which are 5-way 20-shot. Results are reported as average accuracy over 600 episodes with 95% CI. Key: best, second best.

CheXpertShenzhenMontgomeryChestXBACHEyePACSiC-AMDiC-PMSimCLR-v159.73 ± 0.7274.46 ± 0.6663.20 ± 0.8029.86 ± 0.4680.61 ± 0.7432.78 ± 0.4247.90 ± 0.6294.92 ± 0.32MoCo-v257.39 ± 0.7573.76 ± 0.6663.54 ± 0.7428.69 ± 0.4482.53 ± 0.7134.07 ± 0.4374.91 ± 0.6494.21 ± 0.33SwAV57.61 ± 0.7775.22 ± 0.6567.38 ± 0.7127.76 ± 0.4482.78 ± 0.6534.47 ± 0.4370.94 ± 0.6694.69 ± 0.31BYOL58.44 ± 0.7476.29 ± 0.6570.98 ± 0.6730.28 ± 0.4683.28 ± 0.6633.66 ± 0.4174.58 ± 0.6195.83 ± 0.28PIRL58.51 ± 0.7677.48 ± 0.6063.58 ± 0.7628.52 ± 0.4481.02 ± 0.6934.19 ± 0.4175.26 ± 0.6093.49 ± 0.35Supervised (r50)56.14 ± 0.7670.86 ± 0.7262.31 ± 0.7527.71 ± 0.4680.49 ± 0.6831.32 ± 0.4375.70 ± 0.6494.80 ± 0.33Supervised (r18)57.69 ± 0.8074.16 ± 0.6662.94 ± 0.6928.58 ± 0.4080.78 ± 0.7132.96 ± 0.4174.59 ± 0.6293.68 ± 0.35Supervised (d121)57.41 ± 0.7873.43 ± 0.6565.31 ± 0.6727.88 ± 0.4481.21 ± 0.7033.49 ± 0.4277.12 ± 0.6094.86 ± 0.30MIMIC-CheXpert62.45 ± 0.7573.22 ± 0.6469.15 ± 0.6634.82 ± 0.4871.60 ± 0.9825.71 ± 0.4065.05 ± 0.7283.66 ± 0.54MoCo-CXR60.33 ± 0.7473.89 ± 0.6465.02 ± 0.7029.01 ± 0.4669.07 ± 0.8227.78 ± 0.4168.17 ± 0.7287.59 ± 0.47

Table 4. Many-shot shot transfer performance of the pretrained models on the different medical datasets. For EyePACS we do not perform many-shot recognition (linear or finetune) with the domain-specific models (MIMIC-CheXpert, MoCo-CXR), as each run takes over 24 h and we anticipate very poor results, as with few-shot. Key: best, second best.

Table 4. Many-shot shot transfer performance of the pretrained models on the different medical datasets. For EyePACS we do not perform many-shot recognition (linear or finetune) with the domain-specific models (MIMIC-CheXpert, MoCo-CXR), as each run takes over 24 h and we anticipate very poor results, as with few-shot. Key: best, second best.

LinearFinetune CheXpertEyePACSCheXpertEyePACSSimCLR75.0131.5175.8236.48MoCo74.9432.1879.9645.64SwAV74.9737.6177.8444.47BYOL74.6134.2778.9741.17PIRL74.2031.5178.3040.89Supervised (r50)73.4731.4679.4347.08Supervised (r18)71.4330.5279.3142.66Supervised (d121)72.5033.9679.6240.18MIMIC-CheXpert77.28 78.80 MoCo-CXR74.76 74.98

留言 (0)

沒有登入
gif