[Python] How to read data from CIFAR-10 and CIFAR-100

CIFAR-10 and CIFAR-100 are a dataset of 80 million labeled color images with a size of 32x32.

Download the data from the data provider. https://www.cs.toronto.edu/~kriz/cifar.html

Download from "CIFAR-10 python version" and "CIFAR-100 python version" and unzip to a suitable location.

Screenshot from 2015-12-05 01:56:04.png

Screenshot from 2015-12-05 01:56:39.png

input_cifar.py


import cPickle
import numpy as np
import os

def unpickle(file):
    fo = open(file, 'rb')
    dict = cPickle.load(fo)
    fo.close()
    return dict
    
def conv_data2image(data):
    return np.rollaxis(data.reshape((3,32,32)),0,3)
    
def get_cifar10(folder):
    tr_data = np.empty((0,32*32*3))
    tr_labels = np.empty(1)
    '''
    32x32x3
    '''
    for i in range(1,6):
        fname = os.path.join(folder, "%s%d" % ("data_batch_", i))
        data_dict = unpickle(fname)
        if i == 1:
            tr_data = data_dict['data']
            tr_labels = data_dict['labels']
        else:
            tr_data = np.vstack((tr_data, data_dict['data']))
            tr_labels = np.hstack((tr_labels, data_dict['labels']))
    
    data_dict = unpickle(os.path.join(folder, 'test_batch'))
    te_data = data_dict['data']
    te_labels = np.array(data_dict['labels'])
    
    bm = unpickle(os.path.join(folder, 'batches.meta'))
    label_names = bm['label_names']
    return tr_data, tr_labels, te_data, te_labels, label_names

def get_cifar100(folder):
    train_fname = os.path.join(folder,'train')
    test_fname  = os.path.join(folder,'test')
    data_dict = unpickle(train_fname)
    train_data = data_dict['data']
    train_fine_labels = data_dict['fine_labels']
    train_coarse_labels = data_dict['coarse_labels']
    
    data_dict = unpickle(test_fname)
    test_data = data_dict['data']
    test_fine_labels = data_dict['fine_labels']
    test_coarse_labels = data_dict['coarse_labels']
    
    bm = unpickle(os.path.join(folder, 'meta'))
    clabel_names = bm['coarse_label_names']
    flabel_names = bm['fine_label_names']
    
    return train_data, np.array(train_coarse_labels), np.array(train_fine_labels), test_data, np.array(test_coarse_labels), np.array(test_fine_labels), clabel_names, flabel_names

if __name__ == '__main__':
    datapath = "./data/cifar-10-batches-py"
    datapath2 = "./data/cifar-100-python"
    
    tr_data10, tr_labels10, te_data10, te_labels10, label_names10 = get_cifar10(datapath)
    tr_data100, tr_clabels100, tr_flabels100, te_data100, te_clabels100, te_flabels100, clabel_names100, flabel_names100 = get_cifar100(datapath2)

Paste the above code into input_cifar.py, create a data folder in the folder where input_cifar.py is, and put your Dataset there When input_cifar.py is executed, it will be as follows.

CIFAR-10

ipython


In [1]: %run input_cifar.py
In [2]: tr_data10.shape
Out[2]: (50000, 3072)
In [3]: tr_labels10.shape
Out[3]: (50000,)
In [4]: te_data10.shape
Out[4]: (10000, 3072)
In [5]: te_labels10.shape
Out[5]: (10000,)
In [6]: label_names10
Out[6]: 
['airplane',
 'automobile',
 'bird',
 'cat',
 'deer',
 'dog',
 'frog',
 'horse',
 'ship',
 'truck']

In CIFAR-10 and CIFAR-100, the data is divided into 50,000 training data and 10,000 test data. To extract the 0th training data, do as follows.

ipython


In [7]: img0 = tr_data10[0]

The image is a color image with a size of 32x32. The data is stored in Plane format in the order of R, G, B. From the beginning to 1024 is the R Plane, from there to 1024 is the G Plane, and from there to the end is the B Plane.

When displaying an image, the data is in one column, so you have to sort it to 32x32x3. When using scikit-image imshow, you can arrange them in the order of R, G, B, R, G, B, so do as follows.

ipython


In [8]: img0 = img0.reshape((3,32,32))
In [9]: img0.shape
Out[9]: (3, 32, 32)
In [10]: import numpy as np
In [11]: img1 = np.rollaxis(img0, 0, 3)
In [12]: img1.shape
Out[12]: (32, 32, 3)
In [13]: from skimage import io
In [14]: io.imshow(img1)
In [15]: io.show()

figure_1.png

The 0th is frog when you look at the label, but it is not clear even if you look at it because it is reduced to 32x32.

CIFAR-100 In CIFAR-100, images are divided into 100 class categories, and the 100 classes are further grouped into 20 superclasses. The super class and class are as follows. The data storage method is the same as CIFAR-10.

Superclass Classes
aquatic mammals beaver, dolphin, otter, seal, whale
fish aquarium fish, flatfish, ray, shark, trout
flowers orchids, poppies, roses, sunflowers, tulips
food containers bottles, bowls, cans, cups, plates
fruit and vegetables apples, mushrooms, oranges, pears, sweet peppers
household electrical devices clock, computer keyboard, lamp, telephone, television
household furniture bed, chair, couch, table, wardrobe
insects bee, beetle, butterfly, caterpillar, cockroach
large carnivores bear, leopard, lion, tiger, wolf
large man-made outdoor things bridge, castle, house, road, skyscraper
large natural outdoor scenes cloud, forest, mountain, plain, sea
large omnivores and herbivores camel, cattle, chimpanzee, elephant, kangaroo
medium-sized mammals fox, porcupine, possum, raccoon, skunk
non-insect invertebrates crab, lobster, snail, spider, worm
people baby, boy, girl, man, woman
reptiles crocodile, dinosaur, lizard, snake, turtle
small mammals hamster, mouse, rabbit, shrew, squirrel
trees maple, oak, palm, pine, willow
vehicles 1 bicycle, bus, motorcycle, pickup truck, train
vehicles 2 lawn-mower, rocket, streetcar, tank, tractor

The label name of the Superclass is in clabel_names100, and the label name of the class is in flabel_names100.

ipython


In [6]: len(clabel_names100)
Out[6]: 20
In [7]: len(flabel_names100)
Out[7]: 100
In [8]: clabel_names100
Out[8]: 
['aquatic_mammals',
 'fish',
 'flowers',
 'food_containers',
 'fruit_and_vegetables',
 'household_electrical_devices',


 'reptiles',
 'small_mammals',
 'trees',
 'vehicles_1',
 'vehicles_2']
In [9]: flabel_names100
Out[9]: 
['apple',
 'aquarium_fish',
 'baby',
 'bear',
 'beaver',
 'bed',
 'bee',
 'beetle',
 'bicycle',
 'bottle',


 'willow_tree',
 'wolf',
 'woman',
 'worm']
In [10]: 

Recommended Posts

[Python] How to read data from CIFAR-10 and CIFAR-100
How to scrape image data from flickr with python
[Python] How to FFT mp3 data
How to read e-Stat subregion data
Read and use Python files from Python
How to access wikipedia from python
How to connect to various DBs from Python (PEP 249) and SQLAlchemy
[python] Read data
How to package and distribute Python scripts
From Python to using MeCab (and CaboCha)
How to install and use pandas_datareader [Python]
How to update Google Sheets from Python
Porting and modifying doublet-solver from python2 to python3.
How to access RDS from Lambda (python)
Read Python csv and export to txt
python: How to use locals () and globals ()
How to use "deque" for Python data
[Python] How to calculate MAE and RMSE
How to use Python zip and enumerate
Compress python data and write to sqlite
How to read problem data with paiza
How to use is and == in Python
How to get followers and followers from python using the Mastodon API
How to avoid duplication of data when inputting from Python to SQLite.
Python --Read data from a numeric data file to find the covariance matrix, eigenvalues, and eigenvectors
Python canonical notation: How to determine and extract only valid date representations from input data
[Python / Ruby] Understanding with code How to get data from online and write it to CSV
How to read a CSV file with Python 2/3
How to open a web browser from python
How to install Python
How to generate permutations in Python and C ++
How to read PyPI
[Kaggle] From data reading to preprocessing and encoding
Changes from Python 3.0 to Python 3.5
Changes from Python 2 to Python 3.0
Study from Python Hour7: How to use classes
[Python] How to read excel file with pandas
How to install python
How to read JSON
How to generate a Python object from JSON
[Introduction to Python] How to handle JSON format data
Data retrieval from MacNote3 and migration to Write
How to handle Linux commands well from Python
How to convert SVG to PDF and PNG [Python]
[Python] Flow from web scraping to data analysis
[Python] How to use hash function and tuple.
How to read time series data in PyTorch
Data cleaning How to handle missing and outliers
[Python] Read From Stdin
How to plot autocorrelation and partial autocorrelation in python
[Python] From morphological analysis of CSV data to CSV output and graph display [GiNZA]
Summary of how to read numerical data with python [CSV, NetCDF, Fortran binary]
[Python] How to name table data and output it in csv (to_csv method)
Python --Read data from a numeric data file and find the multiple regression line.
[Python] How to remove duplicate values from the list
[Python] [Django] How to use ChoiceField and how to add options
[For beginners] How to study Python3 data analysis exam
Send data from Python to Processing via socket communication
[Python] How to deal with pandas read_html read error
[Python] How to sort dict in list and instance in list
How to download files from Selenium in Python in Chrome