If you want to scrape horse racing data and it becomes machine learning, Colaboratory is convenient, so Make a note of the code for scraping horse racing in the Colaboratory.
(Please note that scraping may not be possible due to html changes. 2020.8 / 30 Operation confirmed)
sample.ipynb
#Install Chromium and selenium
#「!Paste each mark into the Colaboratory code cell.
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium
#Import BeautifulSoup library
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd
race_date ="2020"
race_course_num="06"
race_info ="03"
race_count ="05"
race_no="01"
url = "https://race.netkeiba.com/race/result.html?race_id="+race_date+race_course_num+race_info+race_count+race_no+"&rf=race_list"
#Get the data of the corresponding URL in HTML format
race_html=requests.get(url)
race_html.encoding = race_html.apparent_encoding
race_soup=BeautifulSoup(race_html.text,'html.parser')
#Remove unnecessary strings and store in list
def make_data(data):
data = re.sub(r"\n","",str(data))
data = re.sub(r" ","",str(data))
data = re.sub(r"</td>","'",str(data))
data = re.sub(r"<[^>]*?>","",str(data))
data = re.sub(r"\[","",str(data))
return data
#Get and save only the race table
HorseList = race_soup.find_all("tr",class_="HorseList")
#Lace table shaping
#Number of rows in the table=15("Order of arrival,frame,Horse number,Horse name,Sexual age,Weight,Jockey,time,Difference,Popular,Win odds,After 3F,Corner passing order,stable,Horse weight(Increase / decrease))
col = ["Order of arrival","frame","Horse number","Horse name","Sexual age","Weight","Jockey","time","Difference","Popular","Win odds","After 3F","Corner passing order","stable","Horse weight(Increase / decrease)","Number of runners"]
#Count the number of runners
uma_num = len(HorseList)
df_temp = pd.DataFrame(map(make_data,HorseList),columns=["temp"])
df = df_temp["temp"].str.split("'", expand=True)
df.columns= col
df["Number of runners"] = uma_num
df
After that, you can scrape a lot by changing the date etc. Colaboratory, which does not require environment construction, is convenient after all.
https://qiita.com/Mokutan/items/89c871eac16b8142b5b2 https://qiita.com/ftoyoda/items/fe3e2fe9e962e01ac421
Recommended Posts