You might think, "Grid Mix? I've never heard of it." That's right. Inspired by Grid Mask and Cut Mix ** I made it on my own ** Augmentation as shown below. I've tried a little to see if it works, so I'll leave it as a memo.
-** The effect of Grid Mix mentioned above was confirmed with cifer10. ** ** -** Compared with Cut Mix, which is an Augmentation of the same series. ** **
** Accuracy: The proposed method (Grid Mix) is slightly better ** ** Convergence: The existing method (CutMix) is excellent ** ** Tuning: Proposed method (Grid Mix) may be more troublesome **
I'm not sure because it's just for fun, but I was able to confirm the minimum potential.
One of the recently announced Data Augmentations is Grid Mask. As shown in the figure below, it is a method that masks the image in a grid pattern, which is superior to the conventional method such as Cutout.
Since various people have already introduced this in Qiita etc., I will omit the details, but it will be a method of randomly cutting out a part of the image, pasting it on another image, and giving a label by area ratio. I will. Source paper: https://arxiv.org/abs/1905.04899
I had some doubts about CutMix for some time. It seems that the amount of information is larger near the center, but I wonder if it is okay to simply decide the label by the area ratio.
For example, in the figure below, half of the areas are cats and half are dogs, but I think it's awkward to split the labels in half. It looks like a dog to me.
With a common model, the accuracy is compared by training the cifer10 dataset in the following three cases.
Conv8 layer shallow CNN (not pretrained) input shape: 32x32x3
GridMix Augmentation The proposed method is like a child of CutMix and GridMask, which mixes images with a grid of appropriate size. ** The mask is basically a checkered pattern, but the mesh pattern and no mix are stochastically created. ** **
The figure below shows the checkered pattern, mesh pattern, and no mix in order from the left.
** If only the checkered pattern was used, the mix ratio was constant at about 0.5 and the convergence was poor **, so I tried to make it easy in some cases. By adding a mesh pattern, it is possible to reproduce something similar to the existing method CutMix.
def grid_mixer(img_1, img_2, interval_h, interval_w, thresh=0.3):
#make checkerboad
h, w, _ = img_1.shape
h_start = np.random.randint(0,2*interval_h)
w_start = np.random.randint(0,2*interval_w)
h_grid = ((np.arange(h_start, h_start+h)//interval_h)%2).reshape(-1,1)
w_grid = ((np.arange(w_start, w_start+w)//interval_w)%2).reshape(1,-1)
checkerboard = np.abs(h_grid-w_grid)
#reverse vertical and/or horizontal
if np.random.rand()<thresh:
checkerboard += h_grid*w_grid
if np.random.rand()<thresh:
checkerboard += (1-h_grid)*(1-w_grid)
#mix images
mixed_img = img_1*checkerboard[:, :, np.newaxis]+img_2*(1-checkerboard[:, :, np.newaxis])
mix_rate = np.sum(checkerboard)/(h*w)
return mixed_img, mix_rate
h,w,_=img_1.shape
interval_h = h//np.random.uniform(2, 4)
interval_w = w//np.random.uniform(2, 4)
img, mix_rate = grid_mixer(img_1, img_m_2, interval_h, interval_w, 0.3)
As shown below, the neck is that there are a few parameters.
** Grid spacing: ** If the grid width is too fine, it seems that it can only be picked up in shallow layers (since the default size of cifer-10 is 32x32), so I set the image so that it is divided into 2 to 4 parts vertically and horizontally. I feel that this area also depends on the model. The aspect ratio of the grid is also set to be random, but the effect has not been confirmed.
** Checkered pattern-mesh pattern switching threshold: ** The horizontal mask is excluded with a 30% probability, and the vertical mask is excluded with a 30% probability. As a result, 49% will have a checkered pattern, 42% will have a mesh pattern, and the remaining 9% will have no mix. After all, it does the same thing as adjusting the β distribution used in CutMix and so on.
The table below shows the average values executed three times after tuning the learning rate and schedule parameters.
Case | Epochs | Val_Accuracy | Val_Loss |
---|---|---|---|
No Augmentation | 25 | 0.805 | 0.710 |
CutMix (beta=alpha=0.7) | 32 | 0.841 | 0.505 |
GridMix | 45 | 0.852 | 0.463 |
Grid Mix is slow to converge ... You may want to cut off the first few epochs. But the accuracy is a little better. It's only one case at most, but I feel a little possibility.
In conclusion, ** CutMixing like a Grid may be better than regular CutMix **. Since the verification is insufficient, it is only possible. I can't say anything without trying a little more. If anyone feels like it, they will cry and be happy if you give it a try. If it doesn't work at all, I cry and apologize.