3. Regular Data Augmentation - SummerBigData/Iceberg GitHub Wiki

Denoising

The next thing I tried was to denoise and center the images. I quickly gave up on centering the images as an all-around bad idea, but I did complete a nice application of denoising for data augmentation. The first method I tried was to simply use OpenCV's RGB denoising function. I soon realized that this actually mixed channels together to remove color distortion, but I certainly did not want to mix my horizontal and vertical polarization data. So, I used black and white denoising on both channels independently, leaving the third channel alone. This gives me one knob to play with, the denoising strength for both channels.

Here is a plot of what it can do. The first column are the first 10 images in the dataset with no changes done to them. As we look to further columns on the right, I add more denoising in steps of 10, so the last column has denoising set to 90.

As we can see, denoising does help a lot to isolate the object of interest and remove background. However, it naturally loses information about the border of the object. Because of this, using denoising as a preprocessing method actually performed worse than with no denoising. Instead, I found that using denoising as a method for data augmentation was the way to go. By combining the original 1000 training images with the same 1000 images after different levels of denoising, I created 2000 training images with both the data of the borders of the object and data with the noise removed.

Translations

I wanted to try translations as another way of augmenting the data. Given a trimsize, I took away that many rows and columns of pixels on one corner of the image so as to move the center of the image to that corner (and also shrink the image size in the process). Four corners to trim from were chosen randomly for each image, quadrupling the dataset size to 4000. For the validation set, I center cropped the image to have the same dimensions as the training dataset.

Denoising and Translational augmentation results

By applying denoising and translations with various strengths, I have created this plot: On the x-axis, the trimsize used for translational augmentation is shown. On the y-axis, the denoising strength used is shown. As we can see, denoising actually helps the CNN, but translational augmentation (or at least, my approach) certainly didn't help. Somewhere around a denoising of 15 seems to work best, although the error here seems to be around a percent.