I want to monitor UNIQLO + J page updates [Scraping with python]

The collaboration between UNIQLO and Jil Sander, the + J collection, has attracted a lot of attention. I also enjoyed it, and even now, more than a month after the collaboration started, I often look at the special page of + J. It seems that the special page is updated quite frequently, probably because of its unusually large popularity. Updating the special page does not guarantee a resurgence of inventory, but it is still a concern for fast fashion nerds. So, I wanted to scrape it with python and monitor the update status regularly.

specification

I can't scrape with beautiful soup

Initially, I tried scraping with beautiful soup, but it failed. There is a difference between the downloaded HTML file and the HTML file that can actually be viewed, and it seems that it did not work.

Reference: https://gammasoft.jp/blog/how-to-download-web-page-created-javascript/

So, let's scrape using requests-html.

How to notify slack

There seems to be an Incoming Webhook. Use this. The setting method and simple usage are as follows.

Reference: https://qiita.com/shtnkgm/items/4f0e4dcbb9eb52fdf316

Scraping with requests-html

observation.py


# coding: UTF-8
import configparser
from requests_html import HTMLSession
import slackweb
import datetime
from config import config, update_product

#Session start
session = HTMLSession()
url = config["web_info"]["url"]
r = session.get(url)

#Generate HTML in the browser engine
r.html.render()

#Scraping
#Product name
product_name = r.html.find(".ocI5u4BRvjaH-uauZvJ8R > h3")
product_name_array = []
for name in product_name:
    product_name_array.append(name.text)

#Difference comparison
priv_product_name_array = config["web_info"]["product"].strip("[]").replace("\'", "").split(", ")
priv_set = set(priv_product_name_array)
curt_set = set(product_name_array)

slack = slackweb.Slack(url = config["slack_info"]["in_webhook_url"])
dt_now = datetime.datetime.now()

#Case with reduced display
if len(priv_set) - len(curt_set) > 0:
    diff_result = list(priv_set - curt_set)

#Case with increased display
elif len(curt_set) - len(priv_set) > 0:
    diff_result = list(curt_set - priv_set)
    #message
    slack.notify(text = dt_now.strftime('%Y year%m month%d day%H:%M:%S'))
    for product in diff_result:
        slack.notify(text = "『" + product + "』" + "Has been updated.")
    slack.notify(text = url)

#No display change
elif len(curt_set) == len(priv_set):
    diff_result = []
    #message
    slack.notify(text = dt_now.strftime('%Y year%m month%d day%H:%M:%S'))
    slack.notify(text = "There was no product update.")
else:
    diff_result = []

#Product status update
update_product(product_name_array)

config.py


import configparser
import re

#Read configuration file
config = configparser.ConfigParser()
config.read('config.ini')

def update_product(product_name):
    with open("config.ini", "r") as f:
        lines = f.readlines()
    with open("config.ini", "w") as f:
        for line in lines:
            if re.match(r'(product =)', line):
                f.write("product = {}".format(product_name))
                continue
            f.write(line)

config.ini


[slack_info]
in_webhook_url = https://hooks.slack.com/services/hogehogehogehoge

[web_info]
url = https://www.uniqlo.com/jp/ja/spl/collaboration/plusj/men/
product = ['Wool blend jacket (striped) can be set up', 'Wool blend oversized jacket', 'Hybrid down oversized hoodie', 'Cashmere blend crew neck sweater (long sleeves)', 'Merino Blend V-neck Cardigan (Long Sleeve)', 'Merino Blend V-neck Cardigan (Long Sleeve / Cloud)', 'Supima cotton oversized shirt (long sleeves)', 'Supima Cotton Oversized Shirt (Long Sleeve / Striped)', 'Supima cotton oversized shirt (long sleeves)', 'Supima Cotton Oversized Shirt (Long Sleeve / Striped)', 'Supima Cotton Oversized Shirt (Long Sleeve / Striped)', 'Supima cotton mock neck T (long sleeves)', 'Supima Cotton Crew Neck T (Long Sleeve)', 'Wool blend easy pants', 'Wool blend pants set up', 'Wool blend pants (stripes) can be set up', 'Wool stall']

When I run it ...

スクリーンショット 2020-12-26 0.39.44.png

It seems to have worked.

Reference: https://qiita.com/taka-kawa/items/f0597b2f375da7ddbb73

Future outlook

Recommended Posts

I want to monitor UNIQLO + J page updates [Scraping with python]
I want to debug with Python
I want to analyze logs with Python
I want to play with aws with python
I want to use MATLAB feval with python
I want to make a game with Python
I want to use Temporary Directory with Python2
#Unresolved I want to compile gobject-introspection with Python3
I want to solve APG4b with Python (Chapter 2)
I want to sell Mercari by scraping python
I want to write to a file with Python
I want to handle optimization with python and cplex
I was addicted to scraping with Selenium (+ Python) in 2020
I want to inherit to the back with python dataclass
I want to work with a robot in python.
I want to AWS Lambda with Python on Mac!
[ML Ops] I want to do multi-project with Python
I tried to analyze J League data with Python
I tried scraping with python
I want to run a quantum computer with Python
I want to be able to analyze data with Python (Part 3)
I want to specify another version of Python with pyvenv
I want to be able to analyze data with Python (Part 1)
I want to be able to analyze data with Python (Part 4)
I want to be able to analyze data with Python (Part 2)
I want to automatically attend online classes with Python + Selenium!
[Python] I want to use the -h option with argparse
I tried web scraping with python.
I want to do ○○ with Pandas
I want to use a wildcard that I want to shell with Python remove
I want to know the weather with LINE bot feat.Heroku + Python
I want to solve APG4b with Python (only 4.01 and 4.04 in Chapter 4)
I want to output the beginning of the next month with Python
I want to do a full text search with elasticsearch + python
[Introduction] I want to make a Mastodon Bot with Python! 【Beginners】
I want to detect objects with OpenCV
I tried scraping Yahoo News with Python
I want to blog with Jupyter Notebook
I want to use jar from python
I wanted to solve ABC160 with Python
I want to build a Python environment
I want to pip install with PythonAnywhere
[Personal note] Web page scraping with python3
Monitor web page updates with LINE BOT
[Part1] Scraping with Python → Organize to csv!
Stream redmine updates to hipchat with python
I wanted to solve ABC172 with Python
I want to do it with Python lambda Django, but I will stop
I want to tweet on Twitter with Python, but I'm addicted to it
Environment maintenance made with Docker (I want to post-process GrADS in Python
Scraping with Python
Scraping with Python
I want to do Dunnett's test in Python
I want to analyze songs with Spotify API 2
I wanted to solve NOMURA Contest 2020 with Python
I want to memoize including Python keyword arguments
i-Town Page Scraping: I Wanted To Replace Wise-kun
I want to create a window in Python
I want to email from Gmail using Python.
[Python] I want to manage 7DaysToDie from Discord! 1/3
I want to mock datetime.datetime.now () even with pytest!