Introduction

You might think, "Grid Mix? I've never heard of it." That's right. Inspired by Grid Mask and Cut Mix ** I made it on my own ** Augmentation as shown below. I've tried a little to see if it works, so I'll leave it as a memo.

Overview

Purpose: What did you do

-** The effect of Grid Mix mentioned above was confirmed with cifer10. ** ** -** Compared with Cut Mix, which is an Augmentation of the same series. ** **

Conclusion: how was it

** Accuracy: The proposed method (Grid Mix) is slightly better ** ** Convergence: The existing method (CutMix) is excellent ** ** Tuning: Proposed method (Grid Mix) may be more troublesome **

I'm not sure because it's just for fun, but I was able to confirm the minimum potential.

background

Introducing Grid Mask

One of the recently announced Data Augmentations is Grid Mask. As shown in the figure below, it is a method that masks the image in a grid pattern, which is superior to the conventional method such as Cutout.

The subject of this article is not an introduction to this method, but a proposal for expanding it to Mix. Source paper: https://arxiv.org/abs/2001.04086

Introducing CutMix

Since various people have already introduced this in Qiita etc., I will omit the details, but it will be a method of randomly cutting out a part of the image, pasting it on another image, and giving a label by area ratio. I will. Source paper: https://arxiv.org/abs/1905.04899

CutMix ⇒ Motivation for GridMix

I had some doubts about CutMix for some time. It seems that the amount of information is larger near the center, but I wonder if it is okay to simply decide the label by the area ratio.

For example, in the figure below, half of the areas are cats and half are dogs, but I think it's awkward to split the labels in half. It looks like a dog to me.

⇒ Alright, let's mix it into a mesh. * I thought.

approach

With a common model, the accuracy is compared by training the cifer10 dataset in the following three cases.

No Augmentation
CutMix Augmentation (existing method)
GridMix (Proposed method)

Model to use

Conv8 layer shallow CNN (not pretrained) input shape: 32x32x3

GridMix Augmentation The proposed method is like a child of CutMix and GridMask, which mixes images with a grid of appropriate size. ** The mask is basically a checkered pattern, but the mesh pattern and no mix are stochastically created. ** **

The figure below shows the checkered pattern, mesh pattern, and no mix in order from the left.

** If only the checkered pattern was used, the mix ratio was constant at about 0.5 and the convergence was poor **, so I tried to make it easy in some cases. By adding a mesh pattern, it is possible to reproduce something similar to the existing method CutMix.

def grid_mixer(img_1, img_2, interval_h, interval_w, thresh=0.3):
    #make checkerboad
    h, w, _ = img_1.shape
    h_start = np.random.randint(0,2*interval_h)
    w_start = np.random.randint(0,2*interval_w)
    h_grid = ((np.arange(h_start, h_start+h)//interval_h)%2).reshape(-1,1)
    w_grid = ((np.arange(w_start, w_start+w)//interval_w)%2).reshape(1,-1)
    checkerboard = np.abs(h_grid-w_grid)
    
    #reverse vertical and/or horizontal
    if np.random.rand()<thresh:
        checkerboard += h_grid*w_grid
    if np.random.rand()<thresh:
        checkerboard += (1-h_grid)*(1-w_grid)

    #mix images
    mixed_img = img_1*checkerboard[:, :, np.newaxis]+img_2*(1-checkerboard[:, :, np.newaxis])
    mix_rate = np.sum(checkerboard)/(h*w)
    return mixed_img, mix_rate

h,w,_=img_1.shape
interval_h = h//np.random.uniform(2, 4)
interval_w = w//np.random.uniform(2, 4)                        
img, mix_rate = grid_mixer(img_1, img_m_2, interval_h, interval_w, 0.3)

As shown below, the neck is that there are a few parameters.

** Grid spacing: ** If the grid width is too fine, it seems that it can only be picked up in shallow layers (since the default size of cifer-10 is 32x32), so I set the image so that it is divided into 2 to 4 parts vertically and horizontally. I feel that this area also depends on the model. The aspect ratio of the grid is also set to be random, but the effect has not been confirmed.

** Checkered pattern-mesh pattern switching threshold: ** The horizontal mask is excluded with a 30% probability, and the vertical mask is excluded with a 30% probability. As a result, 49% will have a checkered pattern, 42% will have a mesh pattern, and the remaining 9% will have no mix. After all, it does the same thing as adjusting the β distribution used in CutMix and so on.

Learning conditions

Initial Learning Rate:　0.005 --Epochs (lr Schedule): Adjustment parameters
Optimizer: Adam (beta_1=0.9, beta_2=0.999, decay=0.)
Batch Size: 128

Result evaluation

The table below shows the average values executed three times after tuning the learning rate and schedule parameters.

Case	Epochs	Val_Accuracy	Val_Loss
No Augmentation	25	0.805	0.710
CutMix (beta=alpha=0.7)	32	0.841	0.505
GridMix	45	0.852	0.463

The number of epochs is adjusted where the best performance comes out.

Grid Mix is slow to converge ... You may want to cut off the first few epochs. But the accuracy is a little better. It's only one case at most, but I feel a little possibility.

Summary

In conclusion, ** CutMixing like a Grid may be better than regular CutMix **. Since the verification is insufficient, it is only possible. I can't say anything without trying a little more. If anyone feels like it, they will cry and be happy if you give it a try. If it doesn't work at all, I cry and apologize.

[PYTHON] New Data Augmentation? [Grid Mix]