A recently proposed data augmentation, Cutout, has been shown to improve performance of deep learning models on several image classification tasks. However, this method relies on randomly cutting out sections of the input images, without any consideration of the model that is training on them. This can lead to suboptimal performance of the model, which can be partially mitigated by including images without cutouts in the training procedure. However, this requires extra hyperparameter tuning and does not address this problem directly. In this work, we propose an improved version of Cutout, which utilizes per-pixel importance scores created by a model interpretability method, Grad-CAM. We call this Grad-CAM based Cutout method - Gutout.
Since Gutout is designed to remove sections of the input space that contain information with high importance with respect to a given model (this is what Grad-CAM does), it has the potential to enable model dependent data augmentation. This can resolve the issue of adapting the data augmentation technique to the model that is being trained. Furthermore, it has the potential to enable improved training of ensembles of deep neural networks, by explicitly enforcing diversity among ensemble members. (Which is vital for obtaining performance gains from deep ensembles). This can be done by training each network in the ensemble on the Gutout images generated by the other ensemble networks. This way, if one network “focuses'' on certain regions in the input image and gives them high importance scores, the other network (that will be trained on a Gutout version of that image) will not have any useful information in those regions and need to “focus” on other regions of the image, if it is to succeed at the learning task. This training procedure will drive these two (or more) networks to “focus” on different regions of the input images and thus, will explicitly encourage diversity among the trained ensemble members.