[PYTHON] I tried to create an environment to check regularly using Selenium with AWS Fargate

TL; DR (5 lines)

--I want to automatically check web pages that are updated manually frequently (this time I will use ZOZOTOWN as an example). --Create a Selenium execution environment using Fargate --Push the local container to ECR and deploy it to Fargate --Schedule browser operations by Selenium and operate as batch processing --Check the results of batch processing with CloudWatch

Caution

This article is intended to introduce Fargate + Selenium. The author is an informal candidate and has permission, so I use ZOZOTOWN, my service, as the subject matter! If you want to divert the content of the article, please do not violate the manners and rules!

Description of the service to use

What is Fargate

Normally, when operating a container with EC2, it is necessary to manage the instance, but in the case of Fargate, the instance management is left to the Amazon side, and it is a service that can operate the container serverlessly just by registering the container.

Lambda is a well-known serverless service, but it lacks flexibility due to restrictions such as the inability to use containers and timeouts.

On the other hand, Fargate can provide a variety of services because it can register and use locally running containers as they are.

What is Selenium

A browser-driven test tool for automating web application testing.

It supports various languages such as Python, Ruby, and Java, and you can easily create test scripts.

architecture

This time, we will build the following architecture on AWS.

スクリーンショット 2020-01-03 11.00.20.png

Creating a Dockerfile and main code

Create a test script with Selenium + Python.

Creating a Dockerfile

./Dockerfile


FROM joyzoursky/python-chromedriver:3.8-alpine3.10-selenium

WORKDIR /usr/src
ADD main.py /usr/src

CMD ["python", "main.py"]

This time, when using Selenium + Headless Chrome, This is the base image.

joyzoursky/python-chromedriver:3.7-alpine3.8-selenium https://hub.docker.com/r/joyzoursky/python-chromedriver/

Creating main code

python:./main.py


# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.common.exceptions import TimeoutException, ElementClickInterceptedException, NoSuchElementException

def check_coupon(driver, my_favorite_brand):
    #Transition to ZOZO coupon page
    driver.get("https://zozo.jp/coupon/")
    i = 1
    while True:
      try:
          coupon_brand = driver.find_element_by_xpath(f'//*[@id="body"]/div[3]/ul/li[{i}]/a/figure/div[2]').text
          if coupon_brand == my_favorite_brand:
              return True
          i += 1
      except NoSuchElementException:
          return False

if __name__ == '__main__':
    try:
        #Headless Chrome settings
        options = webdriver.ChromeOptions()
        options.add_argument('--no-sandbox')
        options.add_argument("--disable-setuid-sandbox")
        options.add_argument('--window-size=1420,1080')
        options.add_argument('--headless')
        options.add_argument('--disable-gpu')
        #Connect to Headless Chrome browser
        driver = webdriver.Chrome(options=options)
        #Set selenium operation timeout to 15 seconds
        driver.implicitly_wait(15)

        #favorite brand
        my_favorite_brand = "Carlie e felice"
        #Check coupon
        if check_coupon(driver, my_favorite_brand):
            print("I found it!", my_favorite_brand)
        else:
            print("I couldn't find it today ...")

    #Exception handling
    except ElementClickInterceptedException as ecie:
        print(f"exception!\n{ecie}")
    except TimeoutException as te:
        print(f"timeout!\n{te}")
    finally:
        #End
        driver.close()
        driver.quit()

Check if there is a brand: ** Carlie e felice ** on the coupon page.

It was okay to scrape with Requests + Beautiful Soup 4, but this time I wanted to build an environment using Selenium, so please do not throw it at all ;;

Running container in local environment

#Building a container
$ docker build -t zozo_check_coupons .

#Execute container
$ docker run -it --rm zozo_check_coupons
I found it! Carlie e felice

After confirming that it was successfully executed in the local environment, the next step is to push this container to Amazon ECR.

ECR is an image like a private Docker Hub on AWS.

Building the required environment for AWS

Build an ECR environment

Create ECR repository

Create a repository dedicated to the container you want to manage this time in ECR.

--From Services, select ** ECR ** and ** Create Repository ** 2.png

--Create a repository by entering the repository name "zozo_check_coupons" スクリーンショット 2020-01-02 21.21.16.png

--You have successfully created the repository 4.png

At this time, the URI of the repository will be used when pushing the container, so make a note of it.

Log in to ECR

$ aws ecr get-login --region ap-northeast-1 --no-include-email
docker login -u AWS -p ...
.
.
. .dkr.ecr.ap-northeast-1.amazonaws.com

#Returned docker login~Copy and type
$ docker login -u AWS -p ...
Login Succeeded

If Login Succeeded is displayed, it's OK.

Push to ECR

Copy the URL of the repository you wrote down earlier and push it to the repository you created.

#Tag with the URI of the repository
$ docker build -t xxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/zozo_check_coupons .
#Push tagged container to ECR
$ docker push xxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/zozo_check_coupons

I was able to successfully push the container to the repository 1.png

Make a note of the image URI as it will be used in the task definition.

Cluster creation with ECS

Create a cluster that is the environment for running the container

--Select ** ECS ** from Services and then ** Create Cluster **. 3.png

--Select the cluster template "Networking only" スクリーンショット 2020-01-02 22.31.59.png

--Enter the cluster name and check Create VPC. スクリーンショット 2020-01-02 22.33.54.png

--Finally, you can create a cluster by pressing the create button. 5.png

Defining tasks in ECS

Next, define the task.

-Select ** Create new task definition ** 6.png

--Select ** Fargate ** in the boot type compatibility selection スクリーンショット 2020-01-02 22.56.44.png

--Define the task as follows スクリーンショット 2020-01-02 22.57.26.png

If there is no task execution role, refer to the digression below and create it.

--Select Add Container and copy the container name and the URI of the container image you pushed earlier here. スクリーンショット 2020-01-02 22.59.54.png

--Set memory and CPU スクリーンショット 2020-01-03 01.07.10.png

--Finally, select ** Create ** to complete the task definition. 7.png

Digression: Task definition in CLI

--Define the roles required to execute the task

bash:./task-execution-assume-role.json


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

--Create a role using the definition file

$ aws iam --region ap-northeast-1 create-role --role-name ecsTaskExecutionRole --assume-role-policy-document file://task-execution-assume-role.json

--Create a task definition file

./task-config.json


{
  "family": "zozo-check-coupons-task",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "zozo-check-coupons-task",
      "image": "xxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/zozo_check_coupons:latest",
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-region": "ap-northeast-1",
          "awslogs-group": "/ecs/zozo_check_coupons-task",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ],
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::xxxxxxxx:role/ecsTaskExecutionRole"
}

--Create a task based on the definition file

$ aws ecs register-task-definition --cli-input-json file://task-config.json

Now you can define the task without any mistakes or omissions.

For details, please see here https://docs.aws.amazon.com/ja_jp/AmazonECS/latest/developerguide/ecs-cli-tutorial-fargate.html

Creating a schedule run

Next, we will finally execute the defined tasks on a schedule.

--Select the cluster you created, select ** Schedule Task **, and press ** Create **

8.png

――The settings are as follows. The fixed time was set to ** 24 ** because the coupon renewal was 24 hours. スクリーンショット 2020-01-02 23.05.25.png

--Select the created VPC when creating the cluster 9.png

--Finally, select Create to run the task 10.png

Check logs with CloudWatch

When the task is complete, a log will be sent to CloudWatch

When I checked, I found the following log!

11.png

It doesn't seem to be today. .. ..

in conclusion

We have created an environment for Fargate + Selenium! Fargate is quite flexible because you can register the container that you moved using the container as it is.

However, when crawling, the page loading to the CPU and memory becomes slow, the browser operation by the program may not work well, and a timeout may occur, so measures such as putting sleep were sufficient. It looks better.

Finally, this article is introduced for the purpose of introduction, so please read the etiquette and rules carefully before using it!

References

https://yomon.hatenablog.com/entry/2019/08/fargateselenium

Recommended Posts

I tried to create an environment to check regularly using Selenium with AWS Fargate
I tried to get an AMI using AWS Lambda
I tried to create an article in Wiki.js with SQLAlchemy
I tried to create an environment of MkDocs on Amazon Linux
I tried using Selenium with Headless chrome
I tried to delete bad tweets regularly with AWS Lambda + Twitter API
I tried to detect an object with M2Det!
I tried to build an environment of Ubuntu 20.04 LTS + ROS2 with Raspberry Pi 4
I tried to easily create a fully automatic attendance system with Selenium + Python
I tried to create a table only with Django
I tried to become an Ann Man using OpenCV
I tried to make an OCR application with PySimpleGUI
I tried to find an alternating series with tensorflow
[Introduction to AWS] I tried playing with voice-text conversion ♪
I tried to build an environment with WSL + Ubuntu + VS Code in a Windows environment
Create an environment with virtualenv
I tried to create a reinforcement learning environment for Othello with Open AI gym
I tried using AWS Chalice
I tried to build an environment for machine learning with Python (Mac OS X)
[AWS] I tried using EC2, RDS, Django. Environment construction from 1
I tried to automatically create a report with Markov chain
Minimum Makefile and buildout.cfg to create an environment with buildout
I wrote a script to create a Twitter Bot development environment quickly with AWS Lambda + Python 2.7
I tried to create Quip API
I tried to create a server environment that runs on Windows 10
Create an alias for Route53 to CloudFront with the AWS API
A memorandum when I tried to get it automatically with selenium
I tried to create a list of prime numbers with python
I tried to make a periodical process with Selenium and Python
I tried to create Bulls and Cows with a shell program
Prepare an environment to use OpenCV and Pillow with AWS Lambda
I tried to make a todo application using bottle with python
I tried to log in to twitter automatically with selenium (RPA, scraping)
I tried to make an image similarity function with Python + OpenCV
I tried to create an environment where you can have a fun Zoom meeting with Linux (Ubuntu) + Zoom + OBS Studio + sound effects
I tried to create a linebot (implementation)
I tried sending an SMS with Twilio
I tried using Amazon SQS with django-celery
I tried to execute SQL from the local environment using Looker SDK
I tried to implement Autoencoder with TensorFlow
I tried to create a linebot (preparation)
I tried to visualize AutoEncoder with TensorFlow
How to create an NVIDIA Docker environment
I tried to create a program to convert hexadecimal numbers to decimal numbers with python
I tried to get started with Hy
I tried to make an open / close sensor (Twitter cooperation) with TWE-Lite-2525A
I created an Anaconda environment using Docker!
I tried sending an email with python.
I tried to create a plug-in with HULFT IoT Edge Streaming [Development] (2/3)
Regularly post to Twitter using AWS lambda!
I tried to classify text using TensorFlow
[AWS] [GCP] I tried to make cloud services easy to use with Python
I tried to convert datetime <-> string with tzinfo using strftime () and strptime ()
I want to play with aws with python
I tried to create a plug-in with HULFT IoT Edge Streaming [Execution] (3/3)
I tried to create CSV upload, data processing, download function with Django
I tried to implement CVAE with PyTorch
[Outlook] I tried to automatically create a daily report email with Python
I tried to solve TSP with QAOA
I tried to create a plug-in with HULFT IoT Edge Streaming [Setup] (1/3)
I tried to build a Mac Python development environment with pythonz + direnv