stats.autocorrelation

stats.autocorrelation

Functions

Name Description
calculate_lag_autocorrelation Calculate autocorrelation and partial autocorrelation for a time series.
select_pacf_lags Select the most informative lags using the partial autocorrelation function.

calculate_lag_autocorrelation

stats.autocorrelation.calculate_lag_autocorrelation(
    data,
    n_lags=50,
    last_n_samples=None,
    sort_by='partial_autocorrelation_abs',
    acf_kwargs=None,
    pacf_kwargs=None,
)

Calculate autocorrelation and partial autocorrelation for a time series.

This is a wrapper around statsmodels.acf and statsmodels.pacf.

Parameters

Name Type Description Default
data pd.Series | pd.DataFrame Time series to calculate autocorrelation. If a DataFrame is provided, it must have exactly one column. required
n_lags int Number of lags to calculate autocorrelation. Default is 50. 50
last_n_samples int | None Number of most recent samples to use. If None, use the entire series. Note that partial correlations can only be computed for lags up to 50% of the sample size. For example, if the series has 10 samples, n_lags must be less than or equal to 5. This parameter is useful to speed up calculations when the series is very long. Default is None. None
sort_by str Sort results by lag, partial_autocorrelation_abs, partial_autocorrelation, autocorrelation_abs or autocorrelation. Default is partial_autocorrelation_abs. 'partial_autocorrelation_abs'
acf_kwargs dict[str, object] | None Optional arguments to pass to statsmodels.tsa.stattools.acf. Default is {}. None
pacf_kwargs dict[str, object] | None Optional arguments to pass to statsmodels.tsa.stattools.pacf. Default is {}. None

Returns

Name Type Description
pd.DataFrame DataFrame with columns: lag, partial_autocorrelation_abs, partial_autocorrelation, autocorrelation_abs, autocorrelation.

Raises

Name Type Description
TypeError If data is not a pandas Series or DataFrame with a single column.
ValueError If data is a DataFrame with more than one column.
TypeError If n_lags is not a positive integer.
TypeError If last_n_samples is not None and not a positive integer.
ValueError If sort_by is not one of the valid options.

Examples

Calculate autocorrelation for a simple Series:

import numpy as np
import pandas as pd
from spotforecast2.stats.autocorrelation import calculate_lag_autocorrelation

rng = np.random.default_rng(0)
data = pd.Series(rng.standard_normal(40).cumsum())
result = calculate_lag_autocorrelation(data=data, n_lags=4)
print(result)
assert result.shape == (4, 5)
assert list(result.columns) == [
    "lag",
    "partial_autocorrelation_abs",
    "partial_autocorrelation",
    "autocorrelation_abs",
    "autocorrelation",
]
   lag  partial_autocorrelation_abs  partial_autocorrelation  \
0    1                     0.950412                 0.950412   
1    2                     0.347551                -0.347551   
2    3                     0.264375                -0.264375   
3    4                     0.060061                -0.060061   

   autocorrelation_abs  autocorrelation  
0             0.926652         0.926652  
1             0.826185         0.826185  
2             0.703936         0.703936  
3             0.576826         0.576826  

Calculate autocorrelation using only the last 20 samples:

import numpy as np
import pandas as pd
from spotforecast2.stats.autocorrelation import calculate_lag_autocorrelation

rng = np.random.default_rng(0)
data = pd.Series(rng.standard_normal(40).cumsum())
result = calculate_lag_autocorrelation(
    data=data,
    n_lags=3,
    last_n_samples=20,
)
print(result)
assert result.shape == (3, 5)
   lag  partial_autocorrelation_abs  partial_autocorrelation  \
0    1                     0.758598                 0.758598   
1    2                     0.098336                -0.098336   
2    3                     0.038285                 0.038285   

   autocorrelation_abs  autocorrelation  
0             0.720669         0.720669  
1             0.480353         0.480353  
2             0.328266         0.328266  

Calculate autocorrelation from a DataFrame with a single column:

import numpy as np
import pandas as pd
from spotforecast2.stats.autocorrelation import calculate_lag_autocorrelation

rng = np.random.default_rng(0)
data = pd.DataFrame({"value": rng.standard_normal(40).cumsum()})
result = calculate_lag_autocorrelation(data=data, n_lags=4)
print(result)
assert result.shape == (4, 5)
   lag  partial_autocorrelation_abs  partial_autocorrelation  \
0    1                     0.950412                 0.950412   
1    2                     0.347551                -0.347551   
2    3                     0.264375                -0.264375   
3    4                     0.060061                -0.060061   

   autocorrelation_abs  autocorrelation  
0             0.926652         0.926652  
1             0.826185         0.826185  
2             0.703936         0.703936  
3             0.576826         0.576826  

Sort results by autocorrelation in descending order:

import numpy as np
import pandas as pd
from spotforecast2.stats.autocorrelation import calculate_lag_autocorrelation

rng = np.random.default_rng(0)
data = pd.Series(rng.standard_normal(40).cumsum())
result = calculate_lag_autocorrelation(
    data=data,
    n_lags=4,
    sort_by="autocorrelation",
)
print(result[["lag", "autocorrelation"]])
assert result["autocorrelation"].iloc[0] >= result["autocorrelation"].iloc[-1]
   lag  autocorrelation
0    1         0.926652
1    2         0.826185
2    3         0.703936
3    4         0.576826

select_pacf_lags

stats.autocorrelation.select_pacf_lags(
    series,
    *,
    n_lags=200,
    top_k=8,
    fallback=None,
)

Select the most informative lags using the partial autocorrelation function.

Computes the PACF up to n_lags via calculate_lag_autocorrelation, then returns the top_k lags whose |PACF| exceeds the 95 % significance band (1.96 / sqrt(N)), sorted in ascending order.

This is a pure-compute helper ported from the operational team-4 pipeline (select_key_lags in bart26k-lecture/scripts/team4_4zones_submit.py). No plotting, no side effects beyond an optional DEBUG log line.

Parameters

Name Type Description Default
series pd.Series The time series from which to estimate lags. Must contain at least 2 * n_lags + 1 observations for statsmodels PACF to run without truncating n_lags. required
n_lags int Number of lags passed to calculate_lag_autocorrelation. Default is 200. 200
top_k int Maximum number of lags to return (the top_k significant lags ranked by descending \|PACF\|). Default is 8. 8
fallback list[int] | None Lag list returned when no lag exceeds the significance band (degenerate series: too short, nearly constant, or n_lags too large). Elements are coerced to int before returning. If None, a ValueError is raised instead. None

Returns

Name Type Description
list[int] Sorted list of lag integers (ascending). Length is at most top_k;
list[int] may be shorter if fewer than top_k significant lags exist.

Raises

Name Type Description
ValueError If no lag exceeds the significance band and fallback is None. Pass fallback=[1, 2, 24, 168] (or any operator-chosen constant) to suppress the error and use a safe default instead.

Examples

Select significant lags from a synthetic AR(1) series:

import numpy as np
import pandas as pd
from spotforecast2.stats.autocorrelation import select_pacf_lags

rng = np.random.default_rng(42)
n = 24 * 120  # 120 days of hourly data
ar = np.zeros(n)
for t in range(1, n):
    ar[t] = 0.7 * ar[t - 1] + rng.standard_normal()
series = pd.Series(ar)
lags = select_pacf_lags(series, n_lags=50, top_k=8)
print("selected lags:", lags)
assert isinstance(lags, list)
assert all(isinstance(x, int) for x in lags)
assert lags == sorted(lags)
assert len(lags) <= 8
assert 1 in lags, "lag-1 AR component should be selected"
selected lags: [1, 7, 10, 21, 44]

Select significant lags from an AR(24) process — lag 24 dominates:

import numpy as np
import pandas as pd
from spotforecast2.stats.autocorrelation import select_pacf_lags

rng = np.random.default_rng(42)
n = 24 * 200
ar = np.zeros(n)
for t in range(24, n):
    ar[t] = 0.8 * ar[t - 24] + rng.standard_normal()
series = pd.Series(ar)
lags = select_pacf_lags(series, n_lags=50, top_k=8)
print("selected lags:", lags)
assert 24 in lags, f"lag 24 expected in {lags}"
selected lags: [1, 7, 10, 12, 17, 18, 23, 24]

Degenerate (constant) series — fallback is returned when provided:

import pandas as pd
from spotforecast2.stats.autocorrelation import select_pacf_lags

series = pd.Series([1.0] * 50)
result = select_pacf_lags(series, n_lags=10, fallback=[1, 2, 24])
print("fallback lags:", result)
assert result == [1, 2, 24]
fallback lags: [1, 2, 24]

Degenerate series with no fallback raises ValueError:

import pytest
import pandas as pd
from spotforecast2.stats.autocorrelation import select_pacf_lags

series = pd.Series([1.0] * 50)
with pytest.raises(ValueError, match="no significant"):
    select_pacf_lags(series, n_lags=10, fallback=None)