How to make a Python package using VS Code

Motivation

When you are programming, you may want to reuse the program, or you may want other members to use the program.

In such a case, if you modularize and package the code for each function and maintain the documentation properly, it will be easier for others to use.

VS Code is also a powerful tool for creating Python packages. So I will explain how to create a Python package using VS Code.

It also contains useful information when packaging your data analysis program.

environment

Remarks
OS Windows10
conda 4.8.3 With Anaconda Promptconda -V
Anaconda 2020.02 With Anaconda Promptconda list anaconda
Python 3.8.2
VSCode 1.43.2

Setting up the environment

Please refer to this article to prepare the Python execution environment in VS Code. How to build Python and Jupyter execution environment with VS Code

Directory structure

Once you have an execution environment with VS Code, create folders and files for developing Python packages there.

.
├── Favorite package name
│   ├── __init__.py
│ └── Favorite file name.py
├── setup.py
└── script.py

The package itself is under the favorite package name folder.

01VSCodeDirectory.png

As an example, I made it on VS Code as shown above. As an example, let's name the package mypackage.

Write setup.py

setup.py is a file that sets the dependency information, version information, and package name of the package to be created in the package.

setup.py

from setuptools import setup, find_packages

setup(
    name='mypackage',
    install_requires=['pandas','scikit-learn'],
    packages=find_packages()
)

Write the settings in the argument of the function setup. For example, in ʻinstall_requires`, write the modules required for the package.

There are various other items, so please check Official documentation (write setup script) as appropriate.

Write a program

Let's write a program right away. As an example, this time I will make a package to analyze the data of Kaggle's Titanic.

Suppose you write the following program in the file preprocessing.py in the package. This is a program that preprocesses data.

preprocessing.py

class Preprocesser:
"""
Class to preprocess
"""
    def process(self,data):
        """
Method to preprocess
        """
        processed_data=data.copy()

        #Median Missing Age
        age_m=processed_data['Age'].median()
        processed_data['Age']=processed_data['Age'].fillna(age_m)

        #-----abridgement-----
        #Write preprocessing

        return processed_data

Run the program

Let's run the program. Create a .py script for program execution called script.py directly under the package mypackage folder. As an example, write a program that preprocesses and displays training data.

script.py

def main():
    from mypackage import preprocessing
    import pandas as pd

    train=pd.read_csv('train.csv')

    #Initialize the preprocessing instance and perform preprocessing
    preprocesser=preprocessing.Preprocesser()
    train_processed=preprocesser.process(train)

    print(train_processed.head())

if __name__=='__main__':
    main()

Regarding how to import your own package, if it is directly under the folder of your own package,

from mypackage import preprocessing

like

from Self-made package name import File name of individual python code

You can import the package with.

02Run.png

With this script.py open, press the F5 key on VS Code to execute the program as shown above, and the execution result will be displayed on the Terminal.

Debug the package

[How to build Python and Jupyter execution environment with VS Code #Utilization of debugging](https://qiita.com/SolKul/items/f078877acd23bb1ea5b5#%E3%83%87%E3%83%90%E3%83%83] % E3% 82% B0% E3% 81% AE% E6% B4% BB% E7% 94% A8) You can use VS Code's debugging features in package programming as well.

07DebugPackage.png

For example, press the F9 key on line 7 of the code preprocessing.py in the package as shown above. You will see a red dot at the left end of this line. This is called a ** breakpoint **. And in this state, go back to script.py and press F5 to execute it.

08BreakRun.png

Execution is paused at line 7 in the package as shown above, and the variables declared at that time (here, the variables in preprocessing.py) are displayed in the left sidebar. By using breakpoints in this way, I think that bug fixing (= ** debugging **) of the program will be improved.

Install the package

Try installing this self-made package in another environment. And I will try to see if it works in that other environment.

Open Anaconda Prompt and create a new environment.

conda create -n Favorite environment name python=Python version

03CreateEnv.png

This time, I created an environment called setup_test as an example.

Then start this environment.

conda activate setup_test

Then move to the folder that contains the setup.py edited above.

cd setup.Directory with py

Then install this homebrew package.

python setup.py install

After installation, try running the above script.py in this state. Copy script.py and train.csv to another folder of your choice and try running them there.

05OnlyScript.png

python script.py

06RunIndependent.png

It can be executed as shown above, and the preprocessed training data is displayed. This folder only contains scripts and data, not a self-made package folder. In other words, if you could run it with script.py in this folder, it means that you could install this self-made package in this environment.

Prepare demo data in your own package

When creating your own package, you may want to include data files other than the source code.

For example, suppose you create and distribute a package for data analysis. And when other members want to use the package, I think there is a need to know the behavior of the analysis, although the data cannot be prepared immediately. In such a case, if you prepare the demo data in the package, you can explain it smoothly to that person.

As an example, we will explain the case where the training data for the Titanic is prepared in the package. Add some folders and files to the directory.

.
├── mymodule
│   ├── __init__.py
│   ├── preprocessing.py
│   ├── load_date.py *
│   └── resources *
│        └── train.csv *
├── setup.py
└── script.py

*:Newly added files and folders

First, create a folder for data in your own package. Here, it is resources. And put the training data (train.csv) in it.

Read the data in the package

Write the following code to load the demo data and add it to the package.

load_date.py

import pkgutil,io
import pandas as pd

class DataLoader:
    def load_demo(self):
        train_b=pkgutil.get_data('mypackage','resources/train.csv')
        train_f=io.BytesIO(train_b)
        train=pd.read_csv(train_f)
        return train

Here we use a module called pkgutil which is included as standard in Python. The function pkgutil.get_data () can get its contents in binary by specifying the package name and file name.

Also, ʻio is used to handle the read binary data like a file (file-like object`).

Test if the demo data can be read. Rewrite main () of script.py as follows and execute it with F5 on VS Code.

script.py

def main():
    from mypackage import load_data

    data_loader=load_data.DataLoader()
    train=data_loader.load_demo()
    print(train.head())

09LoadTest.png

The demo data can be read as shown above.

Make sure that data is installed at the same time as package installation

However, with this alone, even if you install this package, the data will not be installed at the same time. Add a line to setup.py so that when you install the package, the data will be installed at the same time.

setup.py

from setuptools import setup, find_packages

setup(
    name='mypackage',
    install_requires=['pandas','scikit-learn'],
    packages=find_packages(),
    package_data={'mypackage': ['resources/*']}
)

By specifying the package name and folder name in package_data, you can specify the data to be installed at the same time as the package installation.

For details, refer to Official document (2.6. Install package data).

Then, as explained above, if you create a new environment and install your own package using setup.py, you can see that the demo data can be used in the installed environment. ..

in conclusion

This is not enough to make your program easy to explain to other members and easy to use. Originally, you could write tests with ʻunit test or pytest, or There are other things to do, such as explaining program I / O with docstring`.

However, I think packaging is the first step in doing so.

If you have come this far, please make your program easy to understand by writing tests, writing docstrings, and converting docstrings to the program specifications as shown below.

10Sphinx.png

Additional Information

This article was very helpful for packaging Python code. It also describes how to write a test using ʻunit test`, so please refer to it. How to make a Python package (written for internships)

However, when it comes to testing, pytest is easier to use. If you are used to ʻunit test, please try using pytest`. pytest (official documentation)

It's also about documentation that describes your program. You can document the docstring as a specification. How to use Sphinx. Read docstring and generate specifications

You may also want to use diagrams and formulas to explain how to use the program and the theory.

In such a case, it is recommended to use a module called mkdocs that can create documents in markdown format. Document creation with MkDocs

If you create a document with this sphinx and mkdocs and host it on AWS S3 etc., how do you use this program from members? If you are asked, it is very convenient because you can send the URL when you are busy.

I referred to here for the data analysis of the Titanic. [Introduction to Kaggle Beginners] Who will survive the Titanic?

Recommended Posts

How to make a Python package using VS Code
How to make a Python package (written for an intern)
[Python] How to make a class iterable
How to code a drone using image recognition
How to set up a Python environment using pyenv
How to execute a command using subprocess in Python
How to create a kubernetes pod from python code
How to install python using anaconda
How to make a Japanese-English translation
How to write a Python class
How to make a slack bot
How to create a Conda package
How to make a crawler --Advanced
How to make a recursive function
How to make a deadman's switch
[Blender] How to make a Blender plugin
How to make a crawler --Basic
[VS Code] ~ Tips when using python ~
[Python] How to make a list of character strings character by character
How to transpose a 2D array using only python [Note]
I tried to make a stopwatch using tkinter in python
How to use pip, a package management system that is indispensable for using Python
How to package and distribute Python scripts
Qiita (1) How to write a code name
How to add a package with PyCharm
How to make a string into an array or an array into a string in Python
How to draw a graph using Matplotlib
[Python] How to convert a 2D list to a 1D list
I tried to make a regular expression of "amount" using Python
I tried to make a regular expression of "time" using Python
How to make a surveillance camera (Security Camera) with Opencv and Python
[Python] How to invert a character string
I tried to make a regular expression of "date" using Python
How to get a stacktrace in python
How to build a Python environment using Virtualenv on Ubuntu 18.04 LTS
Slack --APIGateway --Lambda (Python) --How to make a RedShift interactive app
How to update a Tableau packaged workbook data source using Python
I tried to make a todo application using bottle with python
How to make a Backtrader custom indicator
How to make a Pelican site map
[Python] How to make a matrix of repeating patterns (repmat / tile)
How to build Python and Jupyter execution environment with VS Code
How to run a Maya Python script
How to upload a file to Cloud Storage using Python [Make a fixed point camera with Raspberry PI # 1]
How to generate a new loggroup in CloudWatch using python within Lambda
Tweet in Chama Slack Bot ~ How to make a Slack Bot using AWS Lambda ~
Steps to create a Python virtual environment with VS Code on Windows
How to get a value from a parameter store in lambda (using python)
How to install python package in local environment as a general user
I want to make a web application using React and Python flask
How to make a dialogue system dedicated to beginners
How to read a CSV file with Python 2/3
How to create a Python virtual environment (venv)
How to clear tuples in a list (Python)
How to embed a variable in a python string
How to make a dictionary with a hierarchical structure.
Python (Windows 10) Virtual Environment / Package with VS Code
Debug with VS Code using boost python numpy
How to generate a Python object from JSON
Try to make a "cryptanalysis" cipher with Python
How to add a Python module search path