[Python] Summary of how to use pandas

Data processing memo by pandas The information page about pandas is rather abundant, so it is mainly a summary of links.

Execution environment

I think it is better to use Jupyter (IPython) Notebook as the execution environment.

Install python3 and Jupyter Notebook (formerly ipython notebook) on Windows --Qiita

Install and import pandas

$ pip install pandas
import pandas as pd

Creating a DataFrame

New data creation

You can create a DataFrame with pd.DataFrame. Note that the number of data in each column must match.

Creating a DataFrame

df = pd.DataFrame({
        'A' : [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 7, 8, 9, 10],
        'B' : [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 8, 8, 8, 8, 8]

Read existing data


Read data and create DataFrame

csv_data  = pd.read_csv('./path/to/hoge.csv')

tsv_data  = pd.read_csv('./path/to/hoge.csv', delimiter='\t')

Reading and writing csv / tsv files with pandas | mwSoft Read csv / tsv with non-constant column size with pandas: mwSoft blog Python Coding Memorandum-Part 3- (Mastering pandas read_csv) --Self-consideration Journey

Extract data

Python pandas data selection process in a little more detail <Part 1> --StatsFragments Python pandas data selection process in a little more detail <Part 2> --StatsFragments Python pandas data selection process in a little more detail <Part 2> --StatsFragments Refer to data frame by condition in Pandas --Qiita

Column extraction

Specifying the extraction column from the label name

data = data[['column1', 'column2']]

Row extraction

Extract by specifying conditions

data = data[data.column1 == 'hoge']

Extraction by searching with regular expressions

data = data[data.column1.str.contains(regex)]

Python pandas: Search for DataFrame using regular expressions --Qiita <Python, pandas> Data frame string search --Nekoyuki's memo

Removal of missing values (NaN)

Remove rows that have even one of the missing values

df = df.dropna()

Specify item

df = df.dropna(subset=['Item 1', 'Item 2'])

DataFrame Join

Python pandas data concatenation / join processing as seen in the figure --StatsFragments Merge, join, and concatenate — pandas 0.18.1 documentation

DataFrame processing

Sorting data

Sort by number

Sorting data

#In the case of one type
df = df.sort_values(['type of data'])

# 1 ->Sort in ascending order of 2
df = df.sort_values(['Data type 1', 'Data type 2'])

pandas.DataFrame.sort_values — pandas 0.18.1 documentation Sort by pandas-Qiita

Rename row / column

df.rename(columns={'A': 'a'}, index={'ONE': 'one'}, inplace=True)

pandas.DataFrame.rename — pandas 0.18.1 documentation Change row name / column name of pandas DataFrame | nkmk log

Reassign index in current data order

df = df.reset_index(drop=True)

python - How to reset index in a pandas data frame? - Stack Overflow pandas.DataFrame.reset_index — pandas 0.18.1 documentation

Data type change

Treat as floating point type

df = df.astype(float)

Matrix reversal

Matrix reversal

df = df.T


Conversion from DataFrame to another format

Convert from DataFrame to List


python - Pandas DataFrame to list - Stack Overflow

Export as CSV, TSV


data.to_csv('./path/to/output.csv', sep='\t')

Reading and writing csv / tsv files with pandas | mwSoft

pandas <=> Cooperation between DBs

Microsoft Access (mdb) [Linux] [Python] [Pandas] Read Microsoft Access database (* .mdb) with Pandas --Qiita

Data plot / graph output

Basic specifications of plot in pandas

pandas wraps matplotlib thinly. Up to a certain graph can be output with plot of pandas. Please refer to the following for the basics of the graph output method in pandas.

Visualization — pandas 0.18.1 documentation

Manipulate pandas plot a little more

Mastering the Python pandas plot function-StatsFragments If you use Pandas' Plot function in Python, it's really seamless from data processing to graph creation --Qiita

Missing / outlier / discretizing processing

Python pandas Missing / Outlier / Discretization Processing-StatsFragments

About performance

Three TIPS for maintaining Python pandas performance-StatsFragments


Commentary book by the author of pandas O'Reilly Japan --Introduction to Data Analysis with Python

Reference (About Pandas)

Reference (about data processing)

Recommended Posts

[Python] Summary of how to use pandas
[Python2.7] Summary of how to use unittest
Summary of how to use Python list
[Python2.7] Summary of how to use subprocess
Summary of how to use MNIST in Python
Summary of how to use pandas.DataFrame.loc
[Python] How to use Pandas Series
Summary of how to use pyenv-virtualenv
Summary of how to use csvkit
How to use Pandas 2
[Question] How to use plot_surface of python
[Python] Summary of how to use split and join functions
[Python] How to use two types of type ()
python3: How to use bottle (2)
Summary of how to import files in Python 3
[Python] How to use list 1
How to use Python argparse
How to use Pandas Rolling
Python: How to use pydub
[Python] How to use checkio
Summary of studying Python to use AWS Lambda
[Python] How to use input ()
How to use Python lambda
[Python] How to use virtualenv
python3: How to use bottle (3)
python3: How to use bottle
How to use Python bytes
I tried to summarize how to use matplotlib of python
How to use Python Kivy ① ~ Basics of Kv Language ~
I tried to summarize how to use pandas in python
[Python] Summary of how to specify the color of the figure
Python: How to use async with
How to use Requests (Python Library)
How to use SQLite in Python
[Introduction to Python] Let's use pandas
[Python] How to use list 3 Added
How to use Mysql in python
How to use OpenPose's Python API
How to use ChemSpider in Python
How to use FTP with Python
Python: How to use pydub (playback)
How to use PubChem in Python
[Introduction to Python] Let's use pandas
How to use python zip function
[Introduction to Python] Let's use pandas
[Python] How to use Typetalk API
[python] Summary of how to retrieve lists and dictionary elements
Comparison of how to use higher-order functions in Python 2 and 3
[Introduction to Python] How to use class in Python?
[Python] Use pandas to extract △△ that maximizes ○○
scikit-learn How to use summary (machine learning)
How to install and use pandas_datareader [Python]
[Pandas] What is set_option [How to use]
[python] How to use __command__, function explanation
How to calculate Use% of df command
[Python] How to use import sys sys.argv
[Python] Organizing how to use for statements
Memorandum on how to use gremlin python
python: How to use locals () and globals ()
How to use __slots__ in Python class
Jupyter Notebook Basics of how to use