[PYTHON] How to collect face images relatively easily


When I was collecting face images for machine learning, I thought, "Isn't this trick alone a single article?"

Collect facial images of specific people

One of the things that you tend to do in the beginning of machine learning is to let you know if you are a specific person. To do this, we want to collect a lot of images of that particular person as teacher data. It seems that many of the sites that are searched and collected are crawlered, but you can also collect face images by following the steps below. As an example, suppose you want to collect a lot of images of Manatsu Akimoto.

  1. Open a web browser and maximize
  2. Search for "** Akimoto Manatsu **" in google image search
  3. Capture the screen and save it as an image
  4. Scroll to show the image that was not displayed
  5. Repeat steps 3 and 4 until no more search results are available.
  6. Cut out only the face part from the image saved in 3 using a script (described later) written a little using OpenCV.

Collect facial images of any person

One of the things that you tend to do in the beginning of opportunity learning is to let you judge whether you are a specific person. You may want to collect a lot of facial images other than the correct answer as incorrect images. It seems that many of the sites that are searched and collected are crawlered, but you can also collect face images by following the steps below.

  1. Open a web browser and maximize
  2. Search for "** group photo **" with google image search
  3. Manually download images with many people until you feel like it.
  4. Cut out only the face part from the image saved in 3 using a script (described later) written a little using OpenCV.

A script that cuts out only the face part from the image

The path written in the variable cascade_path will probably differ depending on the environment, so search for it. You need to have OpenCV and python installed in advance.

import cv2
import glob
import sys
import os
import imghdr
import datetime
import time

def main(srcdir, destdir, cascade_path='/home/pi/opencv-3.1.0/data/haarcascades/haarcascade_frontalface_alt.xml'):

  winname = 'searching..'
  cv2.namedWindow(winname, cv2.WINDOW_AUTOSIZE)

  if not os.path.exists(destdir):

  lastsaved = datetime.datetime.now()
  prefix = lastsaved.strftime('%Y%m%d-%H%M%S_')
  counter = 0
  cascade = cv2.CascadeClassifier(cascade_path)

  for filename in glob.glob(srcdir + "/*"):

    if os.path.isdir(filename):
    if imghdr.what(filename) == None:

    print("load " + filename)
    img = cv2.imread(filename)
    frect = cascade.detectMultiScale(img, minSize=(64, 64))
    pos = []
    if len(frect) > 0:
      for r in frect:
        x, y, w, h = r[0], r[1], r[2], r[3]
        face = img[y:y+h, x:x+w]
        if len(face) != 0:
          if w > 0 and h > 0:
            filename = destdir + "/" + prefix + str(counter) + ".jpg "
            cv2.imwrite(filename, face)
            print("save " + filename)
            counter += 1
    for p in pos:
      cv2.rectangle(img, (p[0],p[1]),(p[0]+p[2],p[1]+p[3]),(0,0,255), 8)
    if len(pos) > 0:
      cv2.imshow(winname, img)


if __name__ == '__main__':
  main(sys.argv[1], sys.argv[2])

If you name this img2face.py, for example

python ./img2face.py ./imgs ./face

It works like this. If you put the image file collected earlier under ./imgs and then execute it, the image with the face part cut out will be output under ./face.

in conclusion

Advance preparation is troublesome for machine learning. I would like everyone to publish more and more methods to make it easier.

