[PYTHON] I want to map the EDINET code and securities number

1 What is this article?

Suppose you want to download a securities report for a company. If you know the EDINET code of the securities report submitted by the company, you can obtain the securities report written in XBRL format from the EDINET API. The securities report in XBRL format can be viewed with a browser. Once the XBRL file is available, it is possible, for example, to automatically obtain the capital adequacy ratios of hundreds of companies using scraping technology. However, since the information that maps the securities number and EDINET is not published on the net, I made a code to map them.

By the way, what is XBRL is explained here in an easy-to-understand manner.

2 How to map EDINET CODE and securities number

Processing flow </ b> 99.JPG

2-1 Acquire a mapping table of "Securities number" and "Company name".

This site (Japan Exchange Group) provides a list of securities numbers of companies listed on the Tokyo Stock Exchange. Download in csv file format. Do this manually.

2-2 Get the mapping table of "EDINET CODE" and "company name".

The EDINET code is linked to the securities report. In other words, different EDINET codes are assigned to securities reports of the same company but in different years. In addition, the EDINET API is a specification that allows you to obtain a securities report submitted on a certain day </ b>. Therefore, if you can obtain the "EDINET CODE" and "company name" submitted every day from today to the past year, you can create a mapping table of "EDINET CODE" and "company name" of all listed companies.

2-3 Obtain the mapping table of "Securities number", "EDINET CODE" and "Company name".

Mapping table of "Securities number" and "Company name" </ b> and Mapping table of "EDINET" and "Company name" </ b> Using "Company name" as a key, the above two mappings Merge the tables to generate a mapping table for "Securities Number", "Company Name" and "EDINET CODE".

3 Code posting

Below is the python code that maps the "securities number", "EDINET CODE" and "company name" explained above.

Please set the following constant conditions before executing the code </ b> ・ START_DATE Set the start date and time for collecting data of the company that submitted the securities report. ・ END_DATE #Set the end date and time for collecting data of the company that submitted the securities report. -FPATH ='Enter the path of the mapping table csv file of "security number" and "company name" here. ' -SPATH ='Enter the path to store the mapping table csv file of "EDINET CODE" and "company name" here. ' -OPATH ='Delivery (mapping table of "securities number", "company name" and "EDINET CODE") Enter the path to store the csv file here. '

test.py




# -*- coding: utf-8 -*-
import requests
import datetime
import time
import pandas as pd
import os.path
import math
import numpy as np

#Class definition
class YUHO_GET():

    #Constructor ... Here, variables are read.
    def __init__(self,start_date, end_date,spath,fpath,sel):
        
        
        self.start_date=start_date
        self.end_date=end_date
        self.spath=spath
        self.fpath=fpath
        self.sel=sel
        
    
        
    #Obtain the company name, time, and EDIET code of the securities report for each date.
    def mainproc(self):

        day_list = self.make_day_list() #A function that generates a date between the start and end of a date

        securities_report_doc_list = self.make_doc_id_list(day_list) #Obtain the company name, time, and EDIET code of the securities report for each date.
        number_of_lists = len(securities_report_doc_list)
        print("number_of_lists:", len(securities_report_doc_list))
        print("get_list:", securities_report_doc_list)
       
        
    #A function that generates a date between the start and end of a date
    def make_day_list(self):
        print("start_date:", self.start_date)
        print("end_day:", self.end_date)

        period = self.end_date - self.start_date
        period = int(period.days)
        day_list = []
        for d in range(period):
            day = self.start_date + datetime.timedelta(days=d)
            day_list.append(day)

        day_list.append(self.end_date)

        return day_list

    #Obtain the company name and EDIET code of the securities report for each date.
    def make_doc_id_list(self,day_list):
        securities_yuho_list = []
        securities_4hanki_list = []
        com_edi=[]
        for index, day in enumerate(day_list):
            url = "https://disclosure.edinet-fsa.go.jp/api/v1/documents.json"
            params = {"date": day, "type": 2}

            proxies = {
                "http_proxy": "http://username:[email protected]:8080",
                "https_proxy": "https://username:[email protected]:8080"
            }
            #Access the EDINET API and get the type of document submitted on the specified date.
            res = requests.get(url, params=params, proxies=proxies)
            json_data = res.json()
            print(day)

            for num in range(len(json_data["results"])):
                #Obtained information on the type of document returned from the EDINET API.
                ordinance_code = json_data["results"][num]["ordinanceCode"]
                form_code = json_data["results"][num]["formCode"]

                 #If it is a securities report, "ordinance"_code == "010" and form_code == "030000""Refers to the securities report.
                if ordinance_code == "010" and form_code == "030000":
                    print('★★★★★★★★★★★★ Securities Report ★★★★★★★★★★★★★★★')                    
                    #Store company name in comname
                    comname=json_data["results"][num]["filerName"]
                    #from comname"Co., Ltd."To delete. It corresponds to the front stock and the back stock.
                    comname=comname.split('Co., Ltd.')[0] if comname.split('Co., Ltd.')[0] != '' else comname.split('Co., Ltd.')[-1]
                    #If there is a space in comname, delete it
                    comname=comname.split(' ')[-1] if comname.split(' ')[0]=='' else comname
                    
                    com_edi={ 'company name':comname,
                              'season':json_data["results"][num]["docDescription"],           
                              'EDINET':json_data["results"][num]["docID"],
                             'Filing date':day
                         }
                    

                    securities_yuho_list.append(com_edi)
                    
                                        
            securities_report=securities_yuho_list if self.sel==0 else securities_4hanki_list
            securities_report=pd.DataFrame(securities_report,columns=['company name','season','EDINET','Filing date'])
            securities_report.to_csv(self.spath)
            
            
                    
        return securities_report                


        
        
    #Function that maps EDINET to securities number
    def edinet_syoken_mapping(self):
        df_all_syokennum = pd.read_csv(self.fpath) #Obtain the securities numbers of all stocks listed on the TSE.
        df_all_editnum = pd.read_csv(self.spath) #company name,Read EDINET code
        df_edi_syo=df_all_editnum.loc[:,['company name','EDINET']]
        df_edi_syo=pd.DataFrame(df_edi_syo,columns=['company name','EDINET','code'])                                                     
        df_syoken=df_all_syokennum.loc[:,['code','company name']]
              
  
        for i in range(len(df_edi_syo)): 
            #(3)"company name",「証券番号」表と"company name"「EDINETコード」表から"company name"「証券番号」「EDINETコード」のマッピング表を得る。
            code=self.get_syouken_num(df_syoken,df_edi_syo.iloc[i]['company name'])            
            df_edi_syo['code'][i]=code

        #To the securities number".0"Is given".0"To delete.
        
        df_edi_syo.set_index("code",inplace=True)     
        df_edi_syo.to_csv(OPATH)
        df_edi_syo = pd.read_csv(OPATH, index_col=0)
        df_edi_syo.reset_index("code",inplace=True)        
        df_edi_syo['code'] = df_edi_syo.apply(lambda x:self.bunri(x),axis=1)
        df_edi_syo.set_index("code",inplace=True)
        
        #df_edi_Write syo to a csv file.
        #print(df_edi_syo)
        df_edi_syo.to_csv(OPATH)
        
    
    #A function that separates the integer and decimal parts
    def bunri(self,x):
        return x.code.split(".")[0]
        
        
    #(3)Obtain the securities number from the company name.
    def get_syouken_num(self,df,company_name):
        flag=0
       
        try:
            df1=df[df.company name==str(company_name)]
            meigara_num=df1.iloc[0][0].astype(str)
            
            return  meigara_num
        #If the company name and securities number cannot be mapped, NONE is returned.
        except Exception as e:
            flag='NONE'
            return flag
        
        
##################   MAIN ##################            
            
START_DATE= datetime.date(2019,8,7) #Set the start date and time to collect the data of the company that submitted the securities report.
END_DATE= datetime.date(2020,8,6)   #Set the date and time to collect the data of the company that submitted the securities report.

SEL=0 # SEL=0:Acquire a securities report.


FPATH='Enter the path of the mapping table csv file of "Securities number" and "Company name" here.' 
SPATH='Enter the path to store the mapping table csv file of "EDINET CODE" and "company name" here.'
OPATH='Deliverables(Mapping table of "Securities number", "Company name" and "EDINET CODE")Enter the path to store the csv file here.'

def main():

    yuho=YUHO_GET(START_DATE,END_DATE,SPATH,FPATH,SEL) #(1)Create an instance. Set the parameters.
    yuho.mainproc() #(2)Extract the company name and EDINET that submitted the securities report
    yuho.edinet_syoken_mapping() #(3)Map the securities number and EDINET number.
    
 
if __name__ == "__main__":
    main()

Recommended Posts