Describe how to perform portmanteau test in python

A method of testing whether there is a correlation in a series of correlation functions.

For more information, see [wikipedia](https://en.wikipedia.org/wiki/%E3%81%8B%E3%81%B0%E3%82%93%E6%A4%9C%E5% AE% 9A)

For example, when performing the Ljung-Box test statsmodels.stats.diagnostic.acorr_ljungbox Is used. Click here for details (https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.acorr_ljungbox.html)

For example, the test is performed using randomly generated noise (white Gaussian noise). Of course, there should be no correlation, so the null hypothesis should not be rejected.

```
import matplotlib as mpl
import matplotlib.pyplot as plt
plt.style.use('seaborn')
mpl.rcParams['font.family'] = 'serif'
%matplotlib inline
import numpy as np
from statsmodels.stats.diagnostic import acorr_ljungbox
p = print
#The data points are 1000 points
np.random.seed(42)
data = np.random.standard_normal(1000)
#First plot the data
plt.figure(figsize=(10,6))
plt.plot(data,lw = 1.5)
plt.xlabel('time')
plt.ylabel('value')
plt.xlim([0,100])
plt.title('time vs. value plot');
```

Naturally, the time series data of white Gaussian noise is plotted.

Now, let's do a portmanteau test.

```
result = acorr_ljungbox(data,lags = 5)
p(result)
```

The result is as follows.

```
(array([0.05608493, 0.05613943, 0.31898424, 3.27785331, 3.94903872]), array([0.81279444, 0.97232058, 0.9564194 , 0.51244884, 0.55677627]))
```

It is output in tuple format with two elements, the first is the test statistic and the second is the p-value. Let's make it tabular so that it looks beautiful.

```
result_table = pd.DataFrame(data = result, index=['static value', 'P value'],columns=[str(i) for i in range(1,6)])
result_table
```

The following result is output. The column direction is the size of the lug.

Next, let's test the MA (2) process. Assume the following formula.

```
y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-3}
```

However, $ \ epsilon_t $ is white Gaussian noise. As you can see from the form of the equation, it seems that there is a correlation when the time difference is 3 (for example, $ y_5 $ and $ y_8 $). Of course, it can be confirmed mathematically, but this is confirmed by the portmanteau test.

```
#Creating model data
data = np.zeros(1000)
np.random.seed(42)
err = np.random.standard_normal(1000)
for i in range(1000):
if i-3 < 0:
data[i] = 1 + err[i]
else:
data[i] = 1 + err[i] + 0.5 * err[i-3]
#First plot the data
plt.figure(figsize=(10,6))
plt.plot(data,lw = 1.5)
plt.xlabel('time')
plt.ylabel('value')
plt.title('time vs. value plot (MA(3) model)')
plt.xlim([0,100])
```

```
result = acorr_ljungbox(data,lags = 5)
result_table = pd.DataFrame(data = result, index=['static value', 'P value'],columns=[str(i) for i in range(1,6)])
result_table
```

For example, when P is tested at 0.05, there is no significant difference when the lag is 2 or less, but it is found that there is a significant difference when it is 3 or more (that is, when $ \ rho_3 $ is included). I will.

Recommended Posts