[PYTHON] How to make an interactive LINE BOT 004 (answer the closing date of a listed company)

Overview

Last time, I implemented a function to acquire stock price information, but as a matter of fact, "That's right, I used the stock app without asking BOT. It's only a matter of time before I get the tsukkomi saying, "It's more convenient," and I'm probably using a certain portfolio management app on a daily basis, so I'd like BOT to pinpoint the stock price now. It does not come to have the need. So I went back to the starting point, asked myself what kind of function I wanted, and searched for a more practical function.

"Yes, it would be convenient if we could grasp the settlement (planned) dates all at once."

If you own many stocks, it is quite troublesome to keep track of the schedule of individual stocks, semi-annual financial results, quarterly financial results, etc. on a daily basis. It would be convenient if there was a function that could centrally collect and visualize this, and above all, I think it would be good if it was the original way of BOT.

So, this time, I would like to do a scraping technique aimed at strengthening the skill of extracting and analyzing information from the website with pinpoint.

System configuration

When you enter the activation code including the brand code from the LINE app (①), BOT will access the website, investigate the latest financial results announcement (planned) date, and answer (②). It is also assumed that individual stocks will be specified or a portfolio will be set in advance and multiple stocks will be acquired at once.

システム構成004.png

1. Selection of target website

We will use a site called Stock Forecast. When you open the page for individual stocks, the latest financial results announcement (planned) date is displayed at the position shown in the figure below. If the financial results have been announced, that fact will be displayed along with the date. If it is scheduled to be announced in the future, it will be displayed with the date as the scheduled announcement date. The HTML source corresponding to this position is the " header_main " class starting at line 222. You can see that all the information you want is included under this.

HTML BODY.png

Therefore, it seems that this mission can be realized by understanding the skills of the following three processes.

  1. Get the HTML source
  2. Perspective, <div class =" header_main "> tag extraction
  3. String formatting

So, if you write the conclusion first For 1. requests 2. For Beautiful Soup 3. For re Very smart coding is realized by utilizing each of them.

2. Get HTML source by requests

Requests have already been packaged when the chatbot is implemented, so details are omitted. The coding example is as follows

(Script execution example)How to use requests


(botenv2) [botenv2]$ python
Python 3.6.7 (default, Dec  5 2018, 15:02:16) 
>>> import requests

#Get HTML source
>>> r = requests.get('https://kabuyoho.ifis.co.jp/index.php?action=tp1&sa=report_top&bcode=4689')

#Content confirmation
>>> print(r.headers)
{'Cache-Control': 'max-age=1', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html; charset=UTF-8', (abridgement)
>>> print(r.encoding)
UTF-8
>>> print(r.content)
(abridgement)

With that feeling, no particular explanation is needed. Since the whole Body part is stored in r.content, it will be boiled or baked afterwards. You can pull the information using the target HTML tag as a key.

3. Introduction of Beautiful Soup (parser)

It seems that it is a famous package and already implements various parsers. When I introduced it, I was able to achieve 90% of the purpose this time. .. Compared to the time when I used to play with C language, it feels like a world away.

Beautiful Soup installation


(botenv2) [botenv2]$ pip install BeautifulSoup4

(Script execution example@Continued)How to use Beautiful Soup


>>> from bs4 import BeautifulSoup

#Parsing the Body part with parser
>>> soup = BeautifulSoup(r.content, "html.parser")

Try to display the header_main class.

python


>>> print(soup.find("div", class_="header_main"))

Execution result



<div class="header_main">
<div class="stock_code left">4689</div>
<div class="stock_name left">Z Holdings</div>
<div class="block_update right">
<div class="title left">
Earnings announced
                                                        </div>
<div class="settle left">
                                                                        2Q
                                                        </div>
<div class="date left">
                                                                                                   2019/11/01
                                                                                                </div>
<div class="float_end"></div>
</div>
<div class="float_end"></div>
</div>

It's amazing. It's too convenient to stop trembling.

3. String formatting

All that remains is to delete unnecessary strings. HTML tags are not required, so use the text method.

(Script execution example@Continued)Text extraction


>>> s = soup.find("div", class_="header_main").text
>>> print(s)
4689
Z Holdings


Earnings announced
                                                 

                                                                        2Q
                                                 

                                                                                                   2019/11/01
                                                                                         




>>> 

The tags have been wiped out, but there are still a lot of mysterious gaps left. I didn't know if this was a space or a metacharacter, so I fell in love with it for a moment. In such a case, the substance can be seen by displaying it in byte type.

(reference)Character code confirmation


>>> s.encode()
b'\n4689\n\xef\xbc\xba\xe3\x83\x9b\xe3\x83\xbc\xe3\x83\xab\xe3\x83\x87\xe3\x82\xa3\xe3\x83\xb3\xe3\x82\xb0\xe3\x82\xb9\n\n\n\t\t\t\t\t\t\t\t\t\xe6\xb1\xba\xe7\xae\x97\xe7\x99\xba\xe8\xa1\xa8\xe6\xb8\x88\n\t\t\t\t\t\t\t\n\n\t\t\t\t\t\t\t\t\t2Q\n\t\t\t\t\t\t\t\n\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t2019/11/01\n\t\t\t\t\t\t\t\t\t\t\t\t\n\n\n\n'

The point is to remove / n and / t. Let's mercilessly replace it with a comma and bury it.

(Script execution example@Continued)Metacharacter removal


>>> import re
>>> s = re.sub(r'[\n\t]+', ',', s)
>>> print(s)
,4689,Z Holdings,Earnings announced,2Q,2019/11/01,

If you remove the disturbing commas before and after the finish

(Script execution example@Continued)


>>> s = re.sub(r'(^,)|(,$)','', s)
>>> print(s)
4689,Z Holdings,Earnings announced,2Q,2019/11/01

It feels good. It seems that it can be converted to CSV or dataframe as it is.

By the way, when a brand code that does not exist is acquired, the following character code remains after the above processing.

For stock codes that do not exist


>>> print(s)
b'\xc2\xa0'

This \ xc2 \ xa0 means NO-BREAK SPACE in Unicode and corresponds to & nbsp in HTML. If this character code is included, it will interfere with the subsequent processing, so it is desirable to remove it if possible. (It seems to be a common problem when scraping web pages.)

(Reference) [Python3] What to do if you encounter [\ xa0] during scraping

&removal of nbsp


s = re.sub(r'[\xc2\xa0]','', s)

4. Implemented in BOT app

Here is a function of the above processing.

getSettledata.py


import requests
from bs4 import BeautifulSoup
import re
import logging
logger = logging.getLogger('getSettledata')

source = 'https://kabuyoho.ifis.co.jp/index.php?action=tp1&sa=report_top&bcode='

#Settlement date acquisition function(4689 if the argument is empty(ZHD)Refer to the data of)
def get_settleInfo(code="4689"):

  #Crawling
  try:
    logger.debug('read web data cord = ' + code) #logging
    r = requests.get(source + code)
  except:
    logger.debug('read web data ---> Exception Error') #logging
    return None, 'Exception error: access failed'

  #Scraping
  soup = BeautifulSoup(r.content, "html.parser")
  settleInfo = soup.find("div", class_="header_main").text
  settleInfo = re.sub(r'[\n\t]+', ',', settleInfo) #Removal of metacharacters
  settleInfo = re.sub(r'(^,)|(,$)','', settleInfo) #Comma removal at the beginning and end of the line
  settleInfo = re.sub(r'[\xc2\xa0]','', settleInfo) #&nbsp(\xc2\xa0)Dealing with the problem
  logger.debug('settleInfo result = ' + settleInfo) #logging

  if not settleInfo:
    settleInfo = 'There is no such brand ~'

  return settleInfo

if __name__ == '__main__':
  print(get_settleInfo())

For the main program, add conditional branching processing by identifying the activation code as usual. If you create your own portfolio in SETTLEVIEW_LIST_CORD in advance, you will be eligible for batch acquisition.

chatbot.py(★ Addition)##The existing function part is omitted because it is not changed.


# -*- Coding: utf-8 -*-

from django.views.decorators.csrf import csrf_exempt
from django.http import HttpResponse
from django.shortcuts import render
from datetime import datetime
from time import sleep
import requests
import json
import base64
import logging
import os
import random
import log.logconfig
from utils import tools
import re
from .getStockdata import get_chart
from .getSettledata import get_settleInfo

logger = logging.getLogger('commonLogging')

LINE_ENDPOINT = 'https://api.line.me/v2/bot/message/reply'
LINE_ACCESS_TOKEN = ''

###
###Omitted
###

SETTLEVIEW_KEY = ['Settlement','settle'] #★ Addition
SETTLEVIEW_LIST_KEY = ['Financial results list'] #★ Addition
SETTLEVIEW_LIST_CORD = ['4689','3938','4755','1435','3244','3048'] #★ Addition

@csrf_exempt
def line_handler(request):

    #exception
    if not request.method == 'POST':
      return HttpResponse(status=200)

    logger.debug('line_handler message incoming') #logging
    out_log = tools.outputLog_line_request(request) #logging
    request_json = json.loads(request.body.decode('utf-8'))

    for event in request_json['events']:
      reply_token = event['replyToken']
      message_type = event['message']['type']
      user_id = event['source']['userId']

      #whitelist
      if not user_id == LINE_ALLOW_USER:
        logger.warning('invalid userID:' + user_id) #logging
        return HttpResponse(status=200)

      #action
      if message_type == 'text':
        if:
        ###
        ###Omitted
        ###

        elif any(s in event['message']['text'] for s in SETTLEVIEW_KEY): #★ Addition
          action_data(reply_token,'settleview',event['message']['text']) #★ Addition

        else:
        ###
        ###Omitted
        ###

    return HttpResponse(status=200)

def action_res(reply_token,command,):
    ###
    ###Omitted
    ###

def action_data(reply_token,command,value):

    #Stock chart
    ###
    ###Omitted
    ###

    #######################################################★ Addition from here
    #Financial information
    elif command == 'settleview':
      logger.debug('get_settleInfo on') #logging

      #Bulk acquisition of portfolio stocks
      if any(s in value for s in SETTLEVIEW_LIST_KEY): 
        logger.debug('get_settleInfo LIST') #logging

        results = []
        for cord in SETTLEVIEW_LIST_CORD:
          results.append(get_settleInfo(cord))

        logger.debug('get_settleInfo LIST ---> ' + '\n'.join(results)) #logging
        response_text(reply_token,'\n'.join(results))

      #Acquisition of individual stocks
      else:
        cord = re.search('[0-9]+$', value)
        logger.debug('get_settleInfo cord = ' + cord.group()) #logging

        result = get_settleInfo(cord.group())

        if result[0] is not None:
          response_text(reply_token,result)
        else:
          response_text(reply_token,result[1])
    #######################################################★ Addition up to here

def response_image(reply_token,orgUrl,preUrl,text):
    ###
    ###Omitted
    ###

def response_text(reply_token,text):
    payload = {
      "replyToken": reply_token,
      "messages":[
        {
          "type": 'text',
          "text": text
        }
      ]
    }
    line_post(payload)

def line_post(payload):
    url = LINE_ENDPOINT
    header = {
      "Content-Type": "application/json",
      "Authorization": "Bearer " + LINE_ACCESS_TOKEN
    }
    requests.post(url, headers=header, data=json.dumps(payload))
    out_log = tools.outputLog_line_response(payload) #logging
    logger.debug('line_handler message -->reply') #logging

def ulocal_chatting(event):
    ###
    ###Omitted
    ###

This completes.

line_Launch the bot


(botenv2) [line_bot]$ gunicorn --bind 127.0.0.1:8000 line_bot.wsgi:application

If you drop a message according to the format from the LINE app, the result will be returned.

sc003.png

If you want to get all at once, enter Financial Statement List.

sc004.png

It takes about 1 second to measure 6 brands serially. I was impressed that the processing was faster than I had imagined, but in case it bothers me too much, I will use it moderately so as not to access it frequently. Up to here for this time.

Recommended Posts

How to make an interactive LINE BOT 004 (answer the closing date of a listed company)
The world's most easy-to-understand explanation of how to make a LINE BOT (1) [Account preparation]
How to make a slack bot
The world's most easy-to-understand explanation of how to make LINE BOT (3) [Linkage with server with Git]
How to make an artificial intelligence LINE bot with Flask + LINE Messaging API
How to put a line number at the beginning of a CSV file
How to calculate the volatility of a brand
How to make a Raspberry Pi that speaks the tweets of the specified user
How to create an article from the command line
Make a BOT that shortens the URL of Discord
How to make an interactive CLI tool in Golang
Basics of PyTorch (2) -How to make a neural network-
How to hit the document of Magic Function (Line Magic)
How to display the modification date of a file in C language up to nanoseconds
Explaining how to make LINE BOT in the world's easiest way (2) [Preparing a bot application in a local environment with Django in Python]
How to make a Cisco Webex Teams BOT with Flask
[Python] How to make a list of character strings character by character
Make a LINE BOT (chat)
[Python] I asked LINE BOT to answer the weather forecast.
How to find the scaling factor of a biorthogonal wavelet
How to connect the contents of a list into a string
I made a Line bot that guesses the gender and age of a person from an image
How to determine the existence of a selenium element in Python
How to know the internal structure of an object in Python
How to make a string into an array or an array into a string in Python
How to check the memory size of a variable in Python
Make the theme of Pythonista 3 like Monokai (how to make your own theme)
How to make a command to read the configuration file with pyramid
How to check the memory size of a dictionary in Python
How to find the memory address of a Pandas dataframe value
How to output the output result of the Linux man command to a file
Slack --APIGateway --Lambda (Python) --How to make a RedShift interactive app
How to get the vertex coordinates of a feature in ArcPy
How to extract the desired character string from a line 4 commands
[Python] How to make a matrix of repeating patterns (repmat / tile)
[NNabla] How to remove the middle tier of a pre-built network
How to make a Japanese-English translation
How to make a crawler --Advanced
How to make a recursive function
How to make a deadman's switch
[Blender] How to make a Blender plugin
How to make a crawler --Basic
The story of creating a store search BOT (AI LINE BOT) for Go To EAT in Chiba Prefecture (1)
[Fabric] I was addicted to using boolean as an argument, so make a note of the countermeasures.
Tweet in Chama Slack Bot ~ How to make a Slack Bot using AWS Lambda ~
The story of making a university 100 yen breakfast LINE bot with Python
Python beginners decided to make a LINE bot with Flask (Flask rough commentary)
How to make a rock-paper-scissors bot that can be easily moved (commentary)
[Introduction to Python] How to sort the contents of a list efficiently with list sort
[NNabla] How to add a quantization layer to the middle layer of a trained model
How to create a wrapper that preserves the signature of the function to wrap
How to play a video while watching the number of frames (Mac)
An example of the answer to the reference question of the study session. In python.
[Python] How to make a class iterable
How to check the version of Django
The story of creating a store search BOT (AI LINE BOT) for Go To EAT in Chiba Prefecture (2) [Overview]
Part 1 I wrote an example of the answer to the reference problem of how to write offline in real time in Python
How to save the feature point information of an image in a file and use it for matching
How to make a Backtrader custom indicator
How to make a Pelican site map
I created a Slack bot that confirms and notifies AWS Lambda of the expiration date of an SSL certificate