Figure 1. The various pairwise relationships present in information retrieval datasets. (a) 1-1 Paired, (b) 1-1 Many Paired, (c) 1-1 Aligned Paired, (d) 1-Many Aligned Paired, and (e) Unpaired.
Figure 1. The various pairwise relationships present in information retrieval datasets. (a) 1-1 Paired, (b) 1-1 Many Paired, (c) 1-1 Aligned Paired, (d) 1-Many Aligned Paired, and (e) Unpaired.
Figure 4. Unpaired Multi-Modal Learning (UMML) framework workflow. The diagram shows an example of 50% of images being unpaired where 50% text Bag of Words (BoW) binary vectors are emptied. Similarly, in the case of text being unpaired, the image feature matrices would be emptied (CNN: convolutional neural network).
Figure 4. Unpaired Multi-Modal Learning (UMML) framework workflow. The diagram shows an example of 50% of images being unpaired where 50% text Bag of Words (BoW) binary vectors are emptied. Similarly, in the case of text being unpaired, the image feature matrices would be emptied (CNN: convolutional neural network).
Figure 5. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired images, i.e., images with no corresponding text. The ‘Paired’ points show results when training with a fully paired training set. Subsequent points show results with increasing amounts of unpaired images in the training set in increments of 20%.
Figure 5. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired images, i.e., images with no corresponding text. The ‘Paired’ points show results when training with a fully paired training set. Subsequent points show results with increasing amounts of unpaired images in the training set in increments of 20%.
Figure 6. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired text, i.e., text with no corresponding images. The ‘Paired’ points show results when training with a fully paired training set. Subsequent points show results with increasing amounts of unpaired text in the training set in increments of 20%.
Figure 6. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired text, i.e., text with no corresponding images. The ‘Paired’ points show results when training with a fully paired training set. Subsequent points show results with increasing amounts of unpaired text in the training set in increments of 20%.
Figure 7. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired images and text, i.e., images with no corresponding text and vice versa. The ‘Paired’ points show results when training with a fully paired training set. Subsequent points show results with increasing amounts of unpaired images and text in the training set, for example, ‘10%/10%’ refers to 10% of the training set being unpaired images and another 10% being unpaired text for a total of 20% of the dataset being unpaired samples.
Figure 7. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired images and text, i.e., images with no corresponding text and vice versa. The ‘Paired’ points show results when training with a fully paired training set. Subsequent points show results with increasing amounts of unpaired images and text in the training set, for example, ‘10%/10%’ refers to 10% of the training set being unpaired images and another 10% being unpaired text for a total of 20% of the dataset being unpaired samples.
Figure 8. In (a), 20% of the training set was discarded. In (b), 20% of the training set was unpaired. In this example, for both (a,b), the model will be trained on 8000 paired samples. However, (b) will also train with its additional 2000 unpaired samples. This way, the effect of training with or without the additional unpaired samples can be investigated.
Figure 8. In (a), 20% of the training set was discarded. In (b), 20% of the training set was unpaired. In this example, for both (a,b), the model will be trained on 8000 paired samples. However, (b) will also train with its additional 2000 unpaired samples. This way, the effect of training with or without the additional unpaired samples can be investigated.
Figure 9. Results (mAP) on MIR-Flickr25K and NUS-WIDE with sample discarding, i.e., training set being reduced. The ‘Full’ points show results when training with the full unaltered training set. Subsequent points show results with decreasing amounts of samples, where the given percentage denotes the percentage of samples in the training set which have been discarded. The ‘Random’ points hold the baseline random performance values.
Figure 9. Results (mAP) on MIR-Flickr25K and NUS-WIDE with sample discarding, i.e., training set being reduced. The ‘Full’ points show results when training with the full unaltered training set. Subsequent points show results with decreasing amounts of samples, where the given percentage denotes the percentage of samples in the training set which have been discarded. The ‘Random’ points hold the baseline random performance values.
Table 1. MIRFlickr-25K and NUS-Wide dataset characteristics.
Table 1. MIRFlickr-25K and NUS-Wide dataset characteristics.
DatasetTrainQueryRetrievalMIRFlickr-25K10,000200018,015NUS-Wide10,0002100193,734 Table 2. Example of images, paired tags, and labels from the MIR-Flickr25K and NUS-WIDE datasets. Example images (1) and (2) reprinted under Creative Commons attribution, (1) Author: Martin P. Szymczak, Source, CC BY-NC-ND 2.0 (2) Title: Squirrel, Author: likeaduck, Source, CC BY 2.0. Table 2. Example of images, paired tags, and labels from the MIR-Flickr25K and NUS-WIDE datasets. Example images (1) and (2) reprinted under Creative Commons attribution, (1) Author: Martin P. Szymczak, Source, CC BY-NC-ND 2.0 (2) Title: Squirrel, Author: likeaduck, Source, CC BY 2.0. ImageTagLabel/ClassMIR-Flickr25K example (1)Table 3. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired images, i.e., images with no corresponding text. Column ‘Paired’ shows results when training with a fully paired training set. Subsequent columns show results with increasing amounts of unpaired images in the training set.
Table 3. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired images, i.e., images with no corresponding text. Column ‘Paired’ shows results when training with a fully paired training set. Subsequent columns show results with increasing amounts of unpaired images in the training set.
MIR-Flickr25KNUS-WIDETaskMethodPaired20%40%60%80%100%Paired20%40%60%80%100%i→tDADH0.8360.8070.7890.7500.7020.5620.7010.6900.6830.6560.6460.297AGAH0.8030.7520.7290.6950.6370.5350.6330.6210.5830.5870.5030.267JDSH0.6720.6530.6480.6430.6190.5550.5460.5340.5100.4570.4020.253t→iDADH0.8230.8240.8140.8120.7960.5520.7070.7060.7020.6700.6340.261AGAH0.7900.7900.7860.7790.7420.5400.6460.5950.5910.5960.4010.277JDSH0.6600.6720.6660.6520.6320.5640.5660.4990.4760.4520.4120.256Table 4. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired text, i.e., text with no corresponding images. Column ‘Paired’ shows results when training with a fully paired training set. Subsequent columns show results with increasing amounts of unpaired text in the training set.
Table 4. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired text, i.e., text with no corresponding images. Column ‘Paired’ shows results when training with a fully paired training set. Subsequent columns show results with increasing amounts of unpaired text in the training set.
MIR-Flickr25KNUS-WIDETaskMethodPaired20%40%60%80%100%Paired20%40%60%80%100%i→tDADH0.8360.8310.8310.8260.8200.5250.7010.7000.6960.6830.6740.282AGAH0.8030.7550.7400.7200.6820.5410.6330.5970.5660.5000.3560.267JDSH0.6720.6460.6210.6080.5800.5530.5460.5150.4780.3930.3420.254t→iDADH0.8230.8030.7830.7560.7110.5450.7070.7050.7240.6970.6980.274AGAH0.7900.7600.7440.6980.6420.5350.6460.6450.6530.6510.4640.267JDSH0.6600.6530.6220.6310.6010.5450.5660.5200.5060.4680.4200.249Table 5. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired images and text, i.e., images with no corresponding text and vice versa. Column ‘Paired’ shows results when training with a fully paired training set. Subsequent columns show results with increasing amounts of unpaired images and text in the training set, for example, ‘10% 10%’ refers to 10% of the training set being unpaired images (UI) and another 10% being unpaired text (UT) for a total of 20% of the dataset being unpaired samples.
Table 5. Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired images and text, i.e., images with no corresponding text and vice versa. Column ‘Paired’ shows results when training with a fully paired training set. Subsequent columns show results with increasing amounts of unpaired images and text in the training set, for example, ‘10% 10%’ refers to 10% of the training set being unpaired images (UI) and another 10% being unpaired text (UT) for a total of 20% of the dataset being unpaired samples.
MIR-Flickr25KNUS-WIDETaskMethodPairedUI:Table 6. Results (mAP) on MIR-Flickr25K and NUS-WIDE with sample discarding, i.e., training set being reduced. Column ‘Full’ shows results when training with full training set without any sample discarding. Subsequent columns show results with decreasing amounts of samples, where the given percentage denotes the percentage of samples in the training set which have been discarded. The ‘Random’ column holds the baseline random performance values.
Table 6. Results (mAP) on MIR-Flickr25K and NUS-WIDE with sample discarding, i.e., training set being reduced. Column ‘Full’ shows results when training with full training set without any sample discarding. Subsequent columns show results with decreasing amounts of samples, where the given percentage denotes the percentage of samples in the training set which have been discarded. The ‘Random’ column holds the baseline random performance values.
MIR-Flickr25KNUS-WIDETaskMethodFull20%40%60%80%RandomFull20%40%60%80%Randomi→tDADH0.8360.8240.7990.7790.7440.5430.7010.6830.6480.6100.5750.260AGAH0.8030.7630.7370.7140.6780.5480.6330.6330.5880.4400.3660.267JDSH0.6720.6570.6550.6400.6340.5510.5460.5430.5230.4690.4570.256t→iDADH0.8230.8070.7970.7810.7540.5370.7070.6720.6630.6300.5490.258AGAH0.7900.7790.7780.7560.7300.5380.6460.5670.5470.4880.3770.267JDSH0.6600.6690.6540.6540.6440.5590.5660.5140.5170.4870.4240.245 Table 7. The sampling cases that produced the best retrieval results are indicated by UI: Unpaired Image, UT: Unpaired Text, UIT: Unpaired Image and Text, and SD: Sample discarding. The percentage shown in the brackets is the performance difference by which a given unpaired sample case (shown in Table 3, Table 4 and Table 5) outperformed sample discarding (SD) (shown in Table 6). The first row shows the percentage of training samples being unpaired (UI, UT, UIT), or discarded (SD) depending on the cell value. Table 7. The sampling cases that produced the best retrieval results are indicated by UI: Unpaired Image, UT: Unpaired Text, UIT: Unpaired Image and Text, and SD: Sample discarding. The percentage shown in the brackets is the performance difference by which a given unpaired sample case (shown in Table 3, Table 4 and Table 5) outperformed sample discarding (SD) (shown in Table 6). The first row shows the percentage of training samples being unpaired (UI, UT, UIT), or discarded (SD) depending on the cell value. MIR-Flickr25KTaskMethod20%40%60%80%100%i→tDADHUT (+0.86%)UT (+3.97%)UT (+6.02%)UT (+10.16%)UIT (+39.93%)AGAHSDUT (+0.35%)UT (+0.87%)UT (+0.58%)UIT (+26.43%)JDSHSDSDSDSDUIT (+7.26%)t→iDADHUI (+2.02%)UI (+2.16%)UI (+4.04%)UI (+5.57%)UIT (+41.93%)AGAHUI (+1.52%)UI (+0.95%)UI (+2.98%)UI (+1.67%)UIT (+35.5%)JDSHUI (+0.45%)UI (+1.83%)SDSDUIT (+6.26%)Both
Comments (0)