In machine learning, Data Augmentation that prevents overfitting by processing input data is often used, but recently a new Data Augmentation method has been proposed in the field of image recognition.
Both are methods of masking a random partial rectangular area of the image that is the teacher data. The difference is that Random Erasing randomizes the size and aspect ratio of the rectangle, while Cutout has a fixed size. (However, Cutout is also experimenting with a method of selectively masking a part of the target object, and a fixed size mask is just as effective as that, so if you are using a fixed size mask for simplicity Claim) In addition to image classification, Random Erasing has confirmed its effectiveness in object detection and person matching.
(The image used here is different from the data used this time)
Before image processing | After image processing |
---|---|
I decided to give Random Erasing a try. I chose this instead of Cutout because it seems to be more effective to make the rectangle size random.
The task I did was classify the CIFAR-10 dataset. Implemented with Chainer. The source code is below.
After cloning the source code, you can train with the following command (it is recommended to change it every time you train because leaving the last -p
option the same will overwrite the saved data).
$ python src/download.py
$ python src/dataset.py
$ python src/train.py -g 0 -m vgg_no_fc -p remove_aug --iter 300 -b 128 --lr 0.1 --lr_decay_iter 150,225
The hyperparameters related to Random Erasing are as follows.
This time, I chose a value close to the paper and set it as follows.
Hyperparameters | value |
---|---|
p | 0.5 |
0.02 | |
0.4 | |
1/3 | |
3 |
The code actually used is as follows.
It is implemented as a method of the inherited class of chainer.datasets.TupleDataset
.
The part from "# Remove erasing start" to "# Remove erasing end" is the process related to Remove Erasing, and the random rectangular area is filled with a random value. (I think it is better to align the range of fill values with the range of data to be used)
x
of _transform
is an array of input data and has the size of [batch size, number of channels, height, width].
def _transform(self, x):
image = np.zeros_like(x)
size = x.shape[2]
offset = np.random.randint(-4, 5, size=(2,))
mirror = np.random.randint(2)
remove = np.random.randint(2)
top, left = offset
left = max(0, left)
top = max(0, top)
right = min(size, left + size)
bottom = min(size, top + size)
if mirror > 0:
x = x[:,:,::-1]
image[:,size-bottom:size-top,size-right:size-left] = x[:,top:bottom,left:right]
# Remove erasing start
if remove > 0:
while True:
s = np.random.uniform(0.02, 0.4) * size * size
r = np.random.uniform(-np.log(3.0), np.log(3.0))
r = np.exp(r)
w = int(np.sqrt(s / r))
h = int(np.sqrt(s * r))
left = np.random.randint(0, size)
top = np.random.randint(0, size)
if left + w < size and top + h < size:
break
c = np.random.randint(-128, 128)
image[:, top:top + h, left:left + w] = c
# Remove erasing end
return image
The network code is shown below. It combines Convolutional and Max Pooling like VGG. However, the Fully Connected Layer is not provided, and the number of parameters is reduced by performing Global Pooling instead.
class BatchConv2D(chainer.Chain):
def __init__(self, ch_in, ch_out, ksize, stride=1, pad=0, activation=F.relu):
super(BatchConv2D, self).__init__(
conv=L.Convolution2D(ch_in, ch_out, ksize, stride, pad),
bn=L.BatchNormalization(ch_out),
)
self.activation=activation
def __call__(self, x):
h = self.bn(self.conv(x))
if self.activation is None:
return h
return self.activation(h)
class VGGNoFC(chainer.Chain):
def __init__(self):
super(VGGNoFC, self).__init__(
bconv1_1=BatchConv2D(3, 64, 3, stride=1, pad=1),
bconv1_2=BatchConv2D(64, 64, 3, stride=1, pad=1),
bconv2_1=BatchConv2D(64, 128, 3, stride=1, pad=1),
bconv2_2=BatchConv2D(128, 128, 3, stride=1, pad=1),
bconv3_1=BatchConv2D(128, 256, 3, stride=1, pad=1),
bconv3_2=BatchConv2D(256, 256, 3, stride=1, pad=1),
bconv3_3=BatchConv2D(256, 256, 3, stride=1, pad=1),
bconv3_4=BatchConv2D(256, 256, 3, stride=1, pad=1),
fc=L.Linear(256, 10),
)
def __call__(self, x):
h = self.bconv1_1(x)
h = self.bconv1_2(h)
h = F.dropout(F.max_pooling_2d(h, 2), 0.25)
h = self.bconv2_1(h)
h = self.bconv2_2(h)
h = F.dropout(F.max_pooling_2d(h, 2), 0.25)
h = self.bconv3_1(h)
h = self.bconv3_2(h)
h = self.bconv3_3(h)
h = self.bconv3_4(h)
h = F.dropout(F.max_pooling_2d(h, 2), 0.25)
h = F.average_pooling_2d(h, 4, 1, 0)
h = self.fc(F.dropout(h))
return h
The conditions for learning are as follows.
Accuracy has been improved by using Random Erasing as shown below.
Method | Test Error |
---|---|
Random Erasing not used | 6.68 |
Use Random Erasing | 5.67 |
The transition of Training Error and Test Error is as follows. When using Random Erasing, the discrepancy between Training Error and Test Error is smaller, and it seems that overfitting is suppressed.
Random Erasing Not Used:
Random Erasing used:
It was a simple method of masking the input image, so I was able to try it immediately. This time it was effective, but I think it is necessary to verify whether it is an effective method under various conditions. If it is effective, it may become the standard in the future.
It's such a simple method that I'm personally wondering if it has been proposed in the past.
Recommended Posts