[PYTHON] Output log file with Job (Notebook) of Cloud Pak for Data

It is a method to generate a log when a Job created from Notebook is executed in an analysis project of Cloud Pak for Data (hereinafter CP4D).

As a background, as of CP4D v3.0, it is not possible to include arbitrary log messages in the Job execution log.

Details: Job execution log

When you open the job of the analysis project and click the time stamp part which is the execution result,

image.png

The execution log is displayed. However, this log is only recorded for the Python environment when Job (Notebook) is executed, and any log message cannot be written here. (As of June 11, 2020 CP4D v3.0 LA) image.png

As a workaround, I created a way to output the log to a file from within the notebook and register it as the Data Assets of the analysis project.

Notebook example to output log file

Use Python standard Logger to output logs to both the console (output in Notebook) and the log file. Log files are registered in the data assets of the analysis project using project_lib. For log settings, please refer to this article.

Write this at the beginning of your notebook


from pytz import timezone
from datetime import datetime

#logger settings
logger = logging.getLogger("mylogger")
logger.setLevel(logging.DEBUG)

#Log format settings
def customTime(*args):
    return datetime.now(timezone('Asia/Tokyo')).timetuple()
formatter = logging.Formatter(
    fmt='%(asctime)s.%(msecs)-3d %(levelname)s : %(message)s',
    datefmt="%Y-%m-%d %H:%M:%S"
)
formatter.converter = customTime

#Handler settings for log output to console(For display in Notebook. Level specified as DEBUG)
sh = logging.StreamHandler()
sh.setLevel(logging.DEBUG)
sh.setFormatter(formatter)
logger.addHandler(sh)

#Handler settings for log output to file(For Job execution. The level is specified in INFO. Output the log file to the current directory and register it in Data Asset later.)
logfilename = "mylog_" + datetime.now(timezone('Asia/Tokyo')).strftime('%Y%m%d%H%M%S') + ".log"
fh = logging.FileHandler(logfilename)
fh.setLevel(logging.INFO)
fh.setFormatter(formatter)
logger.addHandler(fh)

#Data Asset Registration Library
import io
from project_lib import Project
project = Project.access()

This is an example of how to use it.

try:
    logger.info('%s', 'Processing started started')
    #Write the process you want to do here
    
    #Output log message at any time
    logger.debug('%s', 'dummy debug message')
    logger.info('%s', 'dummy info message')
    
    #Intentionally generate an error(Division by zero)
    test = 1/0
    
except Exception as e:
    logger.exception('%s', str(repr(e)))
    #Exporting log files to Data Asset(When an error occurs)
    with open(logfilename, 'rb') as z:
        data = io.BytesIO(z.read())
        project.save_data(logfilename, data, set_project_asset=True, overwrite=True)

#Exporting log files to Data Asset(At the end of normal)
with open(logfilename, 'rb') as z:
    data = io.BytesIO(z.read())
    project.save_data(logfilename, data, set_project_asset=True, overwrite=True)

Execution result

When I run it in Notebook, the log is obtained as output as shown below, and the log file is generated in the data asset.

Run-time output in Notebook


2020-06-11 07:43:12.383 INFO :Processing started started
2020-06-11 07:43:12.388 INFO : dummy info message
2020-06-11 07:43:12.389 ERROR : ZeroDivisionError('division by zero',)
Traceback (most recent call last):
  File "<ipython-input-7-0b7d7ffe66e9>", line 10, in <module>
    test = 1/0
ZeroDivisionError: division by zero

Also, if you save the version of Notebook, create a Job, and execute the Job, a log file will be generated in the data asset.

The generated log file looks like this. image.png I will download it and check the contents. The log file has a level of INFO so it does not contain DEBUG messages. image.png

Consideration about log file name

Filling data assets with log files is unpleasant given the intended use of data assets. Therefore, it is conceivable to always overwrite the log file as one. However, since CP4D is OpenShift (Kubernates), Job's Python environment is created as a pod at runtime and disappears when finished. Therefore, in the case of one file name, only the latest Job execution is recorded in the log file, and the past history is deleted by overwriting. Therefore, in the above example, I tried to keep the history by including the time stamp in the log file name. Please adjust this area according to the application.

As mentioned above, it is not good that data assets are filled with logs, but until it becomes possible to output arbitrary logs to the original Job log, there is no choice but to surpass it for a while. Another method is to record the log in the DB table.

Recommended Posts

Output log file with Job (Notebook) of Cloud Pak for Data
Execute API of Cloud Pak for Data analysis project Job with environment variables
Deploy functions with Cloud Pak for Data
How to change python version of Notebook in Watson Studio (or Cloud Pak for Data)
Eliminate garbled Japanese characters in matplotlib graphs in Cloud Pak for Data Notebook
Data analysis for improving POG 2 ~ Analysis with jupyter notebook ~
CSV output of pulse data with Raspberry Pi (CSV output)
I made a window for Log output with Tkinter
Save pandas data in Excel format to data assets with Cloud Pak for Data (Watson Studio)
Recommendation of Jupyter Notebook, a coding environment for data scientists
Python logging standard library for file output by log level
[Stock price analysis] Learning pandas with fictitious data (002: Log output)
Reinforcement learning 7 Learning data log output
Output large log with discord.py
Output csv with different number of digits for each column with numpy
Output to csv file with Python
Output cell to file with Colaboratory
[Cloud102] # 3-1 Bonus for AML Studio NOTEBOOK
Unit test log output with python
Memory-saving matrix conversion of log data
Notes for Python file input / output
4th night of loop with for
CSV output of pulse data with Raspberry Pi (confirm analog input with python)
Initial setting of Jupyter Notebook for Vim lovers ・ Exit with jj (jupyter-vim-binding)
Cloud Pak for Data object operation example in Python (WML client, project_lib)