[PYTHON] Numerical summary of data

I will write about numerical summarization, which is the basic summarization method for data analysis.

Summary of one-dimensional data

import  numpy as np

x=np.array([1,2,3,4.5,5,6.5,7,8,9,10])

average=np.mean(x)  ///Mean value mean function///
(Out  5.6)

med=np.median(x)   ///Median function///
(Out  5.75)

var.p=np.var(x)  ///Sample variance var function///
(Out  8.19)

std=np.std(x)   ///Standard deviation std function///
(Out  2.86)

Please refer to here for the meaning of each word. https://note.com/karaage_love/n/n6f617d38c528

Summary of 2D data

import numpy as np
import matplotlib.pyplot as plt

array=np.loadtxt(fname='example.csv',delimiter=',',encoding="utf-8_sig")
///example.csv contains two columns of data.///

array_x=array[:,0]
array_y=array[:,1]  ///slice///

plt.scatter(araay_x,array_y,s=10,c='blue',alpha='0.5')  

///Creating a scatter plot s is the size c is the color of the scatter plot alpha is the transparency///

np.cov(array_x,array_y,bias=True) 
(Out   [[6.72727273 3.54545455]
        [3.54545455 6.        ]])
 //The covariance result is a 2 × 2 matrix. The diagonal components are the variances of x and y, respectively. The rest is covariance.///
np.corrcoef(array_x,array_y)  
(Out   [[1.         0.55805471]
        [0.55805471 1.        ]]
///Correlation coefficient: After all, the correlation coefficient is other than the diagonal component.///


See here for a detailed summary of 2D data. https://note.com/karaage_love/n/n992a7fdf9b1f

Recommended Posts

Numerical summary of data
Summary of Tensorflow / Keras
Summary of pyenv usage
Summary of string operations
Summary of Python arguments
Preprocessing of prefecture data
Selection of measurement data
Summary of logrotate software logrotate
Summary of test method
Summary of how to read numerical data with python [CSV, NetCDF, Fortran binary]
Summary of tools needed to analyze data in Python
Summary of Pandas methods used when extracting data [Python]
Summary of python file operations
Summary of Python3 list operations
python-fitbit data acquisition query summary
2017.3.6 ~ 3.12 Summary of what we did
Tuning experiment of Tensorflow data
Visualization of data by prefecture
Convenient usage summary of Flask
Summary of Linux distribution types
Fourier transform of raw data
Average estimation of capped data
Python data type summary memo
Basic usage of Pandas Summary
A brief summary of Linux
About data management of anvil-app-server
Summary of Proxy connection settings
Probability prediction of imbalanced data
Basic summary of data manipulation in Python Pandas-Second half: Data aggregation
Let's utilize the railway data of national land numerical information
Performance verification of data preprocessing for machine learning (numerical data) (Part 2)
Performance verification of data preprocessing for machine learning (numerical data) (Part 1)
Basic summary of data manipulation with Python Pandas-First half: Data creation & manipulation
Play with numerical calculation of magnetohydrodynamics
Summary of how to use pandas.DataFrame.loc
Summary of basic knowledge of PyPy Part 1
Summary of basic implementation by PyTorch
Summary of scraping relations (selenium, pyautogui)
A brief summary of Python collections
H29.2.27 ~ 3.5 Summary of what I did
Summary of Stack Overflow Developer Survey 2020
Summary of how to use pyenv-virtualenv
Basic map information using Python Geotiff conversion of numerical elevation data
Machine learning ③ Summary of decision tree
Summary of various operations in Tensorflow
A rough summary of OS history
Memory-saving matrix conversion of log data
A brief summary of qubits (beginners)
Summary of go json conversion behavior
Check the data summary in CASTable
A Tour of Go Learning Summary
Differentiation of time series data (discrete)
10 selections of data extraction by pandas.DataFrame.query
Animation of geographic data by geopandas
Summary of "nl command Advent Calendar 2020"
Recommendation of data analysis using MessagePack
Time series analysis 3 Preprocessing of time series data
[Anaconda3] Summary of frequently used commands
Summary of how to use csvkit
[For competition professionals] Summary of doubling
Summary of Python indexes and slices