[PYTHON] Summary of how to use pandas.DataFrame.loc

"Loc" can extract rows and columns that meet the conditions in the DataFrame. "Loc" often appears when using pandas, but since there are variations in the data specification method, I would like to summarize that area.

About data specification

The following data can be specified for loc.

--Single label --Label list --Label slice object --List of boolean values --Specification of conditional expression

There are many ways to use it ... (゜ _ ゜) You need to be careful when writing a program, but if you do not calmly distinguish which pattern is implemented when reading, it is likely to be "???". I will write each sample code and check the operation.

I actually used it

The data used for the operation check this time was created by myself.

import pandas as pd
loc_sample_data = pd.read_csv("loc_sample_data.csv",index_col="item_name")
loc_sample_data.head()

キャプチャ.JPG

The row index consists of item_name and the columns consist of price, stock and producing_area.

Specifying a single index label

Extract the data by specifying the index label (single) of the row you want to extract. This time, we will extract item C.

loc_sample_data.loc["itemC"]

キャプチャ1.JPG

I was able to extract it. The extracted data is of type Series.

Specifying the index label list

The above example extracts only a single line, but it is possible to specify / extract multiple lines. If you want to specify more than one, specify them in the list. Next, we will extract itemA and itemD.

loc_sample_data.loc[["itemA", "itemD"]]

キャプチャ2.JPG

I was able to extract it. The extracted data was of type DataFrame.

Specifying a single row label or column label

It is also possible to extract data by specifying the label for each row and column. This time, specify row → itemB and column → producing_area to extract data.

loc_sample_data.loc["itemB", "producing_area"]

キャプチャ3.JPG I was able to extract it. Extracted data str type. In this example, it is the extracted data str type, but this depends on the contents of the data stored in the DataFrame.

Specifying row labels and column labels using slices

You can specify multiple rows and columns using slices. Use this to extract the prices of itemA and itemB.

loc_sample_data.loc["itemA":"itemB","price"]

キャプチャ4.JPG

I was able to extract it. Do you use this ...?

Specifying data using a boolean list

By specifying a boolean list with the same length (number of rows) as the source data frame, only True rows can be extracted. This time, I will extract itemB and itemD.

loc_sample_data.loc[[False, True, False, True]]

キャプチャ5.JPG I was able to extract it. It seems that there is no chance to use this if it is a single shot, but it seems that there is a use if you judge in advance whether the extraction condition is satisfied for each line and create a list.

Data specification using conditional expressions

It's the one you're most likely to use. This time, I will try to extract data (itemC, itemD) whose price is greater than 500.

loc_sample_data.loc[loc_sample_data["price"] > 500]

キャプチャ6.JPG

I was able to extract it. After all, if this is a single unit, it seems to be the most used.

Extract only specific columns using conditional expressions

In addition to the conditional expression above, specify a specific column to extract. The conditions are the same as before, but this time we will only extract the producing_area column.

loc_sample_data.loc[loc_sample_data["price"] > 500, ["producing_area"]]

キャプチャ7.JdPG.JPG

Finally

There are many ways to use it, but the one that you should definitely learn is data extraction using conditional expressions. I'm a little long and tired this time, so I'll finish it. See you in the next post!

Recommended Posts

Summary of how to use pandas.DataFrame.loc
Summary of how to use pyenv-virtualenv
Summary of how to use csvkit
[Python2.7] Summary of how to use unittest
Summary of how to use Python list
[Python2.7] Summary of how to use subprocess
Summary of how to use MNIST in Python
[Python] Summary of how to use split and join functions
scikit-learn How to use summary (machine learning)
How to calculate Use% of df command
Jupyter Notebook Basics of how to use
Basics of PyTorch (1) -How to use Tensor-
Summary of how to write AWS Lambda
[Question] How to use plot_surface of python
How to use xml.etree.ElementTree
How to use Python-shell
How to use tf.data
How to use virtualenv
How to use image-match
How to use shogun
How to use Pandas 2
How to use Virtualenv
How to use numpy.vectorize
How to use pytest_report_header
How to use partial
How to use Bio.Phylo
How to use SymPy
How to use x-means
How to use WikiExtractor.py
How to use IPython
How to use virtualenv
How to use Matplotlib
How to use iptables
How to use numpy
How to use TokyoTechFes2015
How to use venv
How to use dictionary {}
How to use Pyenv
How to use list []
How to use python-kabusapi
How to use OptParse
How to use return
How to use dotenv
How to use pyenv-virtualenv
How to use Go.mod
How to use imutils
How to use import
How to use folium (visualization of location information)
A simple example of how to use ArgumentParser
[Python] How to use two types of type ()
Summary of how to import files in Python 3
Not much mention of how to use Pickle
Summary of studying Python to use AWS Lambda
How to use Qt Designer
How to use search sorted
[gensim] How to use Doc2Vec
python3: How to use bottle (2)
Understand how to use django-filter
How to use the generator
[Python] How to use list 1
How to use FastAPI ③ OpenAPI