How to make a Python package (written for an intern)

What is this article?

--Explain how to create a Python package --Explain what you want to be aware of when creating a Python package ――This article is Poem

Synopsis so far

We are planning to have an intern this summer, but they have never made a package.

On the other hand, as a company, if you do not package it, it will take time to use it in practice, which is difficult.

So I wrote a document for internal use called "How to make a Python package". Let the intern read this and make a nice package! It is a convenient plan.

However, I haven't studied the packaging know-how properly and systematically, so I'm worried about it. So, I decided to post a poem on Qiita and have the poem corrected.

Please forgive that Japanese is strange in some places. I am inconvenient in Japanese. [^ 1]

Assumptions for readers of this article

In other words, it is the prerequisite knowledge that the intern student has.

--I use Python for research on a daily basis. --You can do pre-processing work such as machine learning and natural language processing. (100 knocks of natural language processing seems to have been cleared) --I haven't made a package yet. --Git can be used.


↓ Poem from here

Directory structure

In fact, there is no official directory structure in the Python world. The code works even if you have a sloppy directory structure. However, that would make the code very confusing to other developers. That's why you need an easy-to-understand directory structure.

So I recommend following the this blog method. This blog clones the Github repository. Again, let's clone this repository.

MacBook-Pro% git clone [email protected]:kennethreitz/samplemod.git

Package naming convention

Python's package naming convention is __ all lowercase __ and __ underscore delimited __ is recommended. Google's Great People and [People in Python](https://www.python.org/dev/peps/ pep-0008 / # package-and-module-names) says so, so it's safer to follow.

Get ready to make a package

This time, let's create a package called hoge_hoge. It's a good idea to match the cloned directory with the package name. Also, the cloned directory has the original git file. So, let's delete the git file first. After that, make the directory where the script is placed the same name as the package name. In other words, it's hoge_hoge.

MacBook-Pro% mv samplemod hoge_hoge
MacBook-Pro% cd hoge_hoge
MacBook-Pro% rm -rf .git
MacBook-Pro% mv sample hoge_hoge

After running the above command, your directory structure should look like this. (Changed in response to @ shn's suggestion. Requirement.txt has been deleted.)

MacBook-Pro% tree   
.
├── LICENSE
├── Makefile
├── README.rst
├── docs
│   ├── Makefile
│   ├── conf.py
│   ├── index.rst
│   └── make.bat
├── hoge_hoge
│   ├── __init__.py
│   ├── core.py
│   └── helpers.py
├── setup.py
└── tests
    ├── __init__.py
    ├── context.py
    ├── test_advanced.py
    └── test_basic.py

Let's initialize a new git for your next package.

MacBook-Pro% git init
MacBook-Pro% git add .
MacBook-Pro% git commit -m 'first commit'

Now you are ready.

By the way ...

There are several tools that initialize the directory structure of python packages. Like cookiecutter. But I don't recommend it very much. This is because too many extra files are created.

Write setup.py

setup.py is a required file for the package. In this file, write the dependency information, version information, and package name of the package to be created. This poem describes only the minimum fields. If you're interested in other fields, take a look at The Scriptures of People in Python.

At a minimum, the field you want to write

The contents of the original setup.py should look like this.

setup(
    name='sample',
    version='0.0.1',
    description='Sample package for Python-Guide.org',
    long_description=readme,
    author='Kenneth Reitz',
    author_email='[email protected]',
    url='https://github.com/kennethreitz/samplemod',
    license=license,
    packages=find_packages(exclude=('tests', 'docs'))
)
Field name Description More detailed explanation
name Write the name of the package
version Write the current version information
description Let's briefly write "What does this package do?"
install_requires Write dependent package information this
dependency_links Write when the dependent package does not exist on the pypi. this

__ In particular, what you absolutely must write __ is ʻinstall_requires. Developers using your packages don't know "Which package do you depend on?" Without this information, other developers will be in trouble. (Dependency_links` will be removed by [pip update] around 2019/1 (https://github.com/pypa/pip/pull/6060). Alternative methods will be described later.)

Maybe you've done python setup.py install before. This command works because the dependency information is written properly.

Write dependency information for packages that exist on Pypi

ʻList the dependent packages in the install_requiresfield. For example, if you neednumpy, write numpy`.

setup(
    name='sample',
    version='0.0.1',
    description='Sample package for Python-Guide.org',
    long_description=readme,
    author='Kenneth Reitz',
    author_email='[email protected]',
    install_requires=['numpy'],
    url='https://github.com/kennethreitz/samplemod',
    license=license,
    packages=find_packages(exclude=('tests', 'docs'))
)

Describe the dependency information for the in-house package

If you are a general company, you probably operate a package that you have developed in-house. There is no such package on Pypi, so you have to tell setup.py where the package is located. dependency_links is needed at such times.

Story after 2019/1

You may not want to use dependency_links in the future. The reason is that dependency_links is obsolete in pip.

An alternative is to write it in requirements.txt.

requirements.txt has long been used by ʻinstall_requires` as an alternative package-dependent description.

It's a selfish opinion, but I think it's a fairly common method.

Describe the dependency information in requirements.txt like this.

#You can write the package name that exists in pypi as it is
numpy
#Use equations when you want to specify the version
scipy == 1.2.2
#git for packages that don't exist on pypi://Repository URL.git
git://[email protected]/foo/foo.git
#For private repositories+Add ssh git+ssh://[email protected]/foo/foo.git
git+ssh://[email protected]/foo/foo.git

Then, write the process to read requirements.txt in setup.py.

This code is just right.

import os, sys
from setuptools import setup, find_packages

def read_requirements():
    """Parse requirements from requirements.txt."""
    reqs_path = os.path.join('.', 'requirements.txt')
    with open(reqs_path, 'r') as f:
        requirements = [line.rstrip() for line in f]
    return requirements

setup(
    ..., # Other stuff here
    install_requires=read_requirements(),
    )

Story before 2019/1

(Important) This story is no longer available at least on pip. setuptools may soon remove this argument. Therefore, please consider it deprecated.

(In the case of Hayashi) Since I am using Github, I will write only the example of git. For other version control systems, StackOverflow article Let's see.

The basic formula is

git+ssh://git@URL-TO-THE-PACKAGE#egg=PACKAGE-NAME

is. For example, suppose you want to write dependency information for a package called ʻunko`. At that time

git+ssh://[email protected]/our-company-internal/unko.git#egg=unko

is. If you want to fix it in a specific version, you can specify the branch name or tag name after @.

git+ssh://[email protected]/our-company-internal/[email protected]#egg=unko-0.1

@ v0.1 is a github branch name or tag name. -0.1 is the package version.

Rewrite name, author, etc.

Finally, rewrite the information such as name, ʻauthor`. Writing your name will probably make you more attached;)

The final setup.py looks like this.

import os, sys
from setuptools import setup, find_packages

def read_requirements():
    """Parse requirements from requirements.txt."""
    reqs_path = os.path.join('.', 'requirements.txt')
    with open(reqs_path, 'r') as f:
        requirements = [line.rstrip() for line in f]
    return requirements

setup(
    name='hoge_hoge',
    version='0.0.1',
    description='Sample package for Python-Guide.org',
    long_description=readme,
    author='Kensuke Mitsuzawa',
    author_email='[email protected]',
    install_requires=read_requirements(),
    url='https://github.com/kennethreitz/samplemod',
    license=license,
    packages=find_packages(exclude=('tests', 'docs'))
)

Write code

There are many ways to write code, but I won't go into details this time. Please write as before. But [__ "What I want you to remember" __](http://qiita.com/Kensuke-Mitsuzawa/items/7717f823df5a30c27077#%E8%A6%9A%E3%81%88%E3%81%A6% E3% 81% 8A% E3% 81% 84% E3% 81% A6% E3% 81% BB% E3% 81% 97% E3% 81% 84% E3% 81% 93% E3% 81% A8) Therefore I want it.

Write a test

Be sure to write the test. Maybe you've written the code to check the operation before. However, in addition to checking the operation, please consider the test case yourself as much as possible and verify the output.

Here, we will only explain how to write test cases. In Python there is a standard test framework called ʻunit test`. Let's use this framework. I will omit the details, but see This article-JP and This article-EN. Let's master how to use it.

The place to put the test code is under the tests / directory.

├── setup.py
└── tests
    ├── __init__.py
    ├── context.py
    ├── test_advanced.py
    └── test_basic.py

One script, one test script

Once you've created one module script for your package, create one test script as well. That way, the correspondence is clearly understood and maintainability is improved. Now let's say you created hoge_hoge / core.py as a module script.

├── hoge_hoge
│   ├── __init__.py
│   ├── core.py

Place the test script for core.py undertests /. It will be easier to understand if you add the prefix test_ to test_core.py.

└── tests
    ├── __init__.py
    ├── test_core.py

example of test_core.py

There are two things to keep in mind when writing a test script.

--The test class name is camel case --The prefix test_ is required before the method name. Without this prefix, test would be ignored.

Once tested, let's run the script. python test_core.py. Or if it is pycharm, press the normal run button and the test will run.

import unittest


class TestCore(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        # procedures before tests are started. This code block is executed only once
        pass

    @classmethod
    def tearDownClass(cls):
        # procedures after tests are finished. This code block is executed only once
        pass

    def setUp(self):
        # procedures before every tests are started. This code block is executed every time
        pass

    def tearDown(self):
        # procedures after every tests are finished. This code block is executed every time
        pass

    def test_core(self):
        # one test case. here.
        # You must “test_” prefix always. Unless, unittest ignores
        pass

if __name__ == '__main__':
    unittest.main()

Write all test cases in test_all

(Updated in response to @ podhmo's comment. Thanks!)

Once you've written some test scripts, let's make python setup.py test executable.

Be sure to add a habit of executing this command if you make a major change. I often find problems that are out of my mind. When I am working on improving the package, I often forget to run the test by changing it to "Oh, change this too".

It's also very helpful as other developers can easily test it in their own environment.

1: Edit setup.py

The current test directory has this structure.

└── tests
    ├── __init__.py
    ├── test_all.py
    ├── test_advanced.py
    └── test_basic.py

Specify the location of the test directory in setup.py. setup.py will automatically load the test class and run the test for you.

setup(
    name='hoge_hoge',
    version='0.0.1',
    description='Sample package for Python-Guide.org',
    long_description=readme,
    author='Kensuke Mitsuzawa',
    author_email='[email protected]',
    install_requires=['numpy', 'unko'],
    dependency_links=['git+ssh://[email protected]/our-company-internal/unko.git#egg=unko'],
    url='https://github.com/kennethreitz/samplemod',
    license=license,
    packages=find_packages(exclude=('tests', 'docs')),
    test_suite='tests'
)

2: Run all tests

Run python setup.py test. You can see how all the tests are running. If there is an error, it will be displayed at the end.

Write interface and sample code

Mechanko awesome algorithm may be implemented in your package. However, other developers do not have the time to check the contents one by one. (Unfortunately)

The interface is needed to make your mechanko awesome algorithm ready for monkey developers.

Even with a simple interface, monkey developers complain that they don't know how to use it. Let's write a sample code and teach you how to use it.

interface

I think the place to write the interface depends on your preference. If you're writing object-oriented code, you can write interface methods inside your class. If you are writing a function object in a script file like Python, you can prepare a script file for the interface. (By the way, I am the latter.)

In any case, make sure that __ input / output is clarified __. Take the request package interface (https://github.com/kennethreitz/requests/blob/fb014560611f6ebb97e7deb03ad8336c3c8f2db1/requests/api.py) as an example. [^ 2]

Sample code

The important thing in the sample code is to clarify __ input / output __. Monkey developers often copy and paste sample code. The sample code that allows you to understand the input and output immediately even when using copy is the best. As an example, jaconv usage is mentioned.

Write a README

If your team has a README writing style, be sure to follow that style __. (Unfortunately, there is no such method in Heisha)

The principle when writing a README is that "other developers can understand the installation method and the execution method just by reading the README".

In most cases, you should have the following information.

--How to install (usually python setup.py install should be fine) --How to run the test (usually python setup.py test should be fine) ――How do you use it? (In general, "Look at the sample code" should be fine)

If there are any items that need to be prepared in advance, they must be written in the README. For example

--Mecab must be installed in advance --You have to set up RedisDB

In this case, you must write it in the README. If you can afford it, write a Make file. Other developers cry and rejoice.

Things to remember

Clarify input / output

One thing to keep in mind is that "other developers can't understand I / O just by looking at the code." That's why you need sample code or you have to write doc comments firmly. However, it is difficult to write polite comments on all methods and functions (the time available for internships is finite). At the very least, write the __ interface function input / output __ with a polite explanation.

Stable code and developing code

Use github to manage your code. So (at least at our company), your package may soon be used by other developers.

That's why you should cherish the meaning of branch. In most cases, the master branch is the" stable version ". The development version must be pushed to another branch.

As an example, this is my development style.

--Push to master until the first alpha version (always state at the beginning of the README "This is under development") --After the alpha version, create another development branch. How to handle branches follows git-flow [^ 3].

Follow Git-flow

git-flow is a rule that determines how to handle branches. This rule gives teams flexibility in development. Follow git-flow, even if you're developing it by yourself. This is because other developers can participate in the development smoothly. For the flow of git-flow, see This good article.

Be aware of reproducibility

Do the same for other developers to get the same results. Don't hard code information that can only be executed in your environment. Common hard-coded information

--Unix command execution path --Absolute path to external file

If this information is hard coded, it cannot be executed by other developers. I'm in trouble. For example, here is a bad example

def do_something():
    path_to_external_unix_command = '/usr/local/bin/command'
    path_to_input_file = '/Users/kensuke-mi/Desktop/codes/hoge_hoge/input.txt'
    return call_some_function(path_to_input_file)

The contents of "input.txt" remain unknown to anyone. In this case, this is good.

--If you need an external file, push it to the same repository. Use relative path from script --Allow information to be passed as an argument to the function. --If unix command is required, describe it in the configuration file.

Here's a good example above.

def do_something(path_to_external_unix_command, path_to_input_file):
    return call_some_function(path_to_input_file)

path_to_input_file = './input.txt'
path_to_external_unix_command = load_from_setting_file()
do_something(path_to_external_unix_command, path_to_input_file)

Other

Let's use type hint!

I wrote "I / O clearly" several times, but to be honest, it's troublesome. Moreover, it is easy to forget when you change it. In this case, it doesn't make sense to write a clear comment. So here is the type hint. It is a mechanism that can describe the type of an object like Java. However, the difference from Java is that Python's "type hint has no effect on execution". [^ 4]

Main advantages

--Easy for other developers to understand I / O types --Pycharm (as well as other IDEs) warns you when there is an I / O mismatch, so it's easy to notice mistakes. --Pycharm (and other IDEs) will suggest methods from type information, speeding up development

Bad points

――The amount of typing characters increases a little

Example

I'll give you an example. Who are the names and persons in this implementation? It is not well understood. And what is greeting a function that returns? Is also unknown.

def greeting(names, persons):
    target = decide_target_person(persons)
    greet = generate(target, names)
    return greet

Now, let's add a type hint here.

def greeting(names: List[str], persons: Dict[str, str])->str:
    target = decide_target_person(persons)
    greet = generate(target, names)
    return greet

Now the I / O is clear!

About pycharm support

Pycharm fully supports type hints.

--Suggest of the method of the object --Warning when there is a type mismatch

The image below is the development screen of Pycharm. I am getting a warning because I / O of target is wrong.

スクリーンショット 2016-07-29 2.49.55.png

For python-3 only packages

This area is a good article.

-PEP 0484 type hint (JP) -About Python 3.5 Type hint (JP)

For packages that support both python-2 and python-3

In Python2, the type hint of Python3 expression cannot be interpreted and an error occurs. However, PEP offers a great way to "write a type hint as a comment". Pycharm is only partially supported yet, but it should be fully supported in the near future.

-Python2 and type hints (JP)


↑ So far Poem

Write a poem and look back

It feels messy, but is it okay? Is it possible to make a package with only such information? I am anxious. Tsukkomi is welcome.

[^ 1]: Actually, the original document itself is written in English. Some international students come to the internship. [^ 2]: The request package has a reputation for being "very easy to read code". [^ 3]: Actually git-flow is the name of the tool, but it often indicates the development style (only around me ??) [^ 4]: Isn't that against Python's philosophy? I hear a few voices saying, but it is an item that is ʻAccepted` in the PEP where people in Python are active. Therefore, there are some things that you should keep. (In the first place, the member who made this proposal is Guido van Rossum)

Recommended Posts

How to make a Python package (written for an intern)
How to make a Python package using VS Code
[Python] How to make a class iterable
How to make a string into an array or an array into a string in Python
How to make a QGIS plugin (package generation)
[Blender x Python] How to make an animation
How to make Python faster for beginners [numpy]
Tips for Python beginners to use the Scikit-image example for themselves 7 How to make a module
How to make a Japanese-English translation
How to write a Python class
How to use pip, a package management system that is indispensable for using Python
[Python] How to make a list of character strings character by character
[Python] How to make an adjacency matrix / adjacency list [Graph theory]
How to make a slack bot
How to create a Conda package
Experiment to make a self-catering PDF for Kindle with Python
How to make a crawler --Advanced
How to make a recursive function
How to define multiple variables in a python for statement
How to make a deadman's switch
[Blender] How to make a Blender plugin
How to make a crawler --Basic
[Python] How to output a pandas table to an excel file
A python beginner tried to intern at an IT company
An introduction to Python for non-engineers
How to make a surveillance camera (Security Camera) with Opencv and Python
How to create a heatmap with an arbitrary domain in Python
[Introduction to Python] How to use the in operator in a for statement?
[For beginners] How to register a library created in Python in PyPI
Spigot (Paper) Introduction to how to make a plug-in for 2020 # 01 (Environment construction)
Slack --APIGateway --Lambda (Python) --How to make a RedShift interactive app
How to use an external editor for Python development with Grasshopper
How to make a unit test Part.1 Design pattern for introduction
[Python] How to make a matrix of repeating patterns (repmat / tile)
How to package and distribute Python scripts
How to add a package with PyCharm
python3 How to install an external module
[Python] How to convert a 2D list to a 1D list
How to convert Python to an exe file
[Python] How to invert a character string
How to install a package using a repository
How to get a stacktrace in python
[Python] Organizing how to use for statements
Make Qt for Python app a desktop app
How to use "deque" for Python data
An introduction to Python for machine learning
How to make a Backtrader custom indicator
How to make a Pelican site map
How to run a Maya Python script
An introduction to Python for C programmers
How to make a model for object detection using YOLO in 3 hours
How to install python package in local environment as a general user
An introduction to self-made Python web applications for a sluggish third-year web engineer
How to make a dialogue system dedicated to beginners
How to read a CSV file with Python 2/3
How to create a Python virtual environment (venv)
How to make an embedded Linux device driver (8)
How to open a web browser from python
How to make an embedded Linux device driver (1)
How to clear tuples in a list (Python)
How to make an embedded Linux device driver (4)