stats.autocorrelation
stats.autocorrelation
Functions
| Name | Description |
|---|---|
| calculate_lag_autocorrelation | Calculate autocorrelation and partial autocorrelation for a time series. |
calculate_lag_autocorrelation
stats.autocorrelation.calculate_lag_autocorrelation(
data,
n_lags=50,
last_n_samples=None,
sort_by='partial_autocorrelation_abs',
acf_kwargs=None,
pacf_kwargs=None,
)Calculate autocorrelation and partial autocorrelation for a time series.
This is a wrapper around statsmodels.acf and statsmodels.pacf.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| data | pd.Series | pd.DataFrame | Time series to calculate autocorrelation. If a DataFrame is provided, it must have exactly one column. | required |
| n_lags | int | Number of lags to calculate autocorrelation. Default is 50. | 50 |
| last_n_samples | int | None | Number of most recent samples to use. If None, use the entire series. Note that partial correlations can only be computed for lags up to 50% of the sample size. For example, if the series has 10 samples, n_lags must be less than or equal to 5. This parameter is useful to speed up calculations when the series is very long. Default is None. | None |
| sort_by | str | Sort results by lag, partial_autocorrelation_abs, partial_autocorrelation, autocorrelation_abs or autocorrelation. Default is partial_autocorrelation_abs. | 'partial_autocorrelation_abs' |
| acf_kwargs | dict[str, object] | None | Optional arguments to pass to statsmodels.tsa.stattools.acf. Default is {}. | None |
| pacf_kwargs | dict[str, object] | None | Optional arguments to pass to statsmodels.tsa.stattools.pacf. Default is {}. | None |
Returns
| Name | Type | Description |
|---|---|---|
| pd.DataFrame | DataFrame with columns: lag, partial_autocorrelation_abs, partial_autocorrelation, autocorrelation_abs, autocorrelation. |
Raises
| Name | Type | Description |
|---|---|---|
| TypeError | If data is not a pandas Series or DataFrame with a single column. | |
| ValueError | If data is a DataFrame with more than one column. | |
| TypeError | If n_lags is not a positive integer. | |
| TypeError | If last_n_samples is not None and not a positive integer. | |
| ValueError | If sort_by is not one of the valid options. |
Examples
Calculate autocorrelation for a simple Series:
>>> import pandas as pd
>>> from spotforecast.stats.autocorrelation import calculate_lag_autocorrelation
>>>
>>> data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> result = calculate_lag_autocorrelation(data=data, n_lags=4)
>>> result.head()
lag partial_autocorrelation_abs partial_autocorrelation autocorrelation_abs autocorrelation
0 1 0.999998 0.999998 1.000000 1.000000
1 2 0.000002 -0.000002 0.645497 0.645497
2 3 0.000002 0.000002 0.298549 0.298549
3 4 0.000001 -0.000001 0.068719 0.068719Calculate autocorrelation using only the last 8 samples:
>>> import pandas as pd
>>> from spotforecast2.stats.autocorrelation import calculate_lag_autocorrelation
>>>
>>> data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> result = calculate_lag_autocorrelation(
... data=data,
... n_lags=3,
... last_n_samples=8
... )
>>> result.shape
(3, 5)Calculate autocorrelation from a DataFrame with a single column:
>>> import pandas as pd
>>> from spotforecast.stats.autocorrelation import calculate_lag_autocorrelation
>>>
>>> data = pd.DataFrame({'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
>>> result = calculate_lag_autocorrelation(data=data, n_lags=4)
>>> result.shape
(4, 5)Sort results by autocorrelation in descending order:
>>> import pandas as pd
>>> from spotforecast.stats.autocorrelation import calculate_lag_autocorrelation
>>>
>>> data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> result = calculate_lag_autocorrelation(
... data=data,
... n_lags=4,
... sort_by='autocorrelation'
... )
>>> result[['lag', 'autocorrelation']].head()
lag autocorrelation
0 1 1.000000
1 2 0.645497
2 3 0.298549
3 4 0.068719