[PYTHON] Nogizaka46 Get blog images by scraping

Introduction

Nogizaka46 I saved the image of the blog by scraping with Python. I scraped the first page of Manatsu Akimoto.

code

scraping.py


import requests
import urllib.request
import os
from bs4 import BeautifulSoup


def scraping():
    #Member URL
    member_name = "manatsu.akimoto"
    url = "http://blog.nogizaka46.com/" + member_name + "/"

    #Create folder
    if not os.path.isdir(member_name):  # ”member_If there is no "name" folder
        print("Create folder")
        os.mkdir(member_name)

    #For counting the number of saved sheets
    cnt = 0

    #BeautifulSoup object generation
    headers = {"User-Agent": "Mozilla/5.0"}
    soup = BeautifulSoup(requests.get(
        url, headers=headers).content, 'html.parser')

    #Find the html where the image is located
    for entry in soup.find_all("div", class_="entrybody"):  #Get all entry bodies
        for img in entry.find_all("img"):  #Get all img
            cnt += 1
            urllib.request.urlretrieve(
                img.attrs["src"], "./" + member_name + "/" + member_name + "-" + str(cnt) + ".jpeg ")
    print("the image" + str(cnt) + "I saved a sheet.")


if __name__ == '__main__':
    scraping()


Member URL

スクリーンショット (1).png スクリーンショット (2).png Since the member's name is used as the URL, I put the name of the member I want to get in member_name.

member_name = "manatsu.akimoto"
url = "http://blog.nogizaka46.com/" + member_name + "/"

BeautifulSoup object generation

There is an easy-to-understand explanation on the following site. Reference site: https://python.civic-apps.com/beautifulsoup4-selector/


Find the html where the image is located

Looking at the html that makes up the blog, スクリーンショット (7).png There is a body in the div tag of the class name "entrybody" スクリーンショット (8).png There is an image in the img tag in it, so save it in a folder as soon as you find it.

for entry in soup.find_all("div", class_="entrybody"):#Get all entry bodies
    for img in entry.find_all("img"):#Get all img
        cnt += 1
        urllib.request.urlretrieve(img.attrs["src"], "./" + member_name + "/" + member_name + "-" + str(cnt) + ".jpeg ")

Execution result

Page at the time of execution

screencapture-blog-nogizaka46-manatsu-akimoto-2020-02-19-12_42_35.jpg

Created folder

スクリーンショット (12).png

Command line display

Create folder
I have saved 22 images.

Recommended Posts

Nogizaka46 Get blog images by scraping
Get Splunk download link by scraping
Collect images by scraping. Make more videos!
Scraping 100 Fortnite images
Get iPad maintenance by scraping and notify Slack
Nogizaka46 A program that automatically saves blog images
Image scraping ②-Get images from bing, yahoo, Flickr
Get a list of Qiita likes by scraping
Hinatazaka's blog image scraping
Scraping immediately from google images!
Save images with web scraping
Image collection by web scraping
One-liner web scraping by tse
Collect only facial images of a specific person by web scraping