Number of most recent samples to use. If None, use the entire series. Note that partial correlations can only be computed for lags up to 50% of the sample size. For example, if the series has 10 samples, n_lags must be less than or equal to 5. This parameter is useful to speed up calculations when the series is very long. Default is None.
Sort results by lag, partial_autocorrelation_abs, partial_autocorrelation, autocorrelation_abs or autocorrelation. Default is partial_autocorrelation_abs.
Select the most informative lags using the partial autocorrelation function.
Computes the PACF up to n_lags via calculate_lag_autocorrelation, then returns the top_k lags whose |PACF| exceeds the 95 % significance band (1.96 / sqrt(N)), sorted in ascending order.
This is a pure-compute helper ported from the operational team-4 pipeline (select_key_lags in bart26k-lecture/scripts/team4_4zones_submit.py). No plotting, no side effects beyond an optional DEBUG log line.
Lag list returned when no lag exceeds the significance band (degenerate series: too short, nearly constant, or n_lags too large). Elements are coerced to int before returning. If None, a ValueError is raised instead.
If no lag exceeds the significance band and fallback is None. Pass fallback=[1, 2, 24, 168] (or any operator-chosen constant) to suppress the error and use a safe default instead.
Examples
Select significant lags from a synthetic AR(1) series:
import numpy as npimport pandas as pdfrom spotforecast2.stats.autocorrelation import select_pacf_lagsrng = np.random.default_rng(42)n =24*120# 120 days of hourly dataar = np.zeros(n)for t inrange(1, n): ar[t] =0.7* ar[t -1] + rng.standard_normal()series = pd.Series(ar)lags = select_pacf_lags(series, n_lags=50, top_k=8)print("selected lags:", lags)assertisinstance(lags, list)assertall(isinstance(x, int) for x in lags)assert lags ==sorted(lags)assertlen(lags) <=8assert1in lags, "lag-1 AR component should be selected"
selected lags: [1, 7, 10, 21, 44]
Select significant lags from an AR(24) process — lag 24 dominates:
import numpy as npimport pandas as pdfrom spotforecast2.stats.autocorrelation import select_pacf_lagsrng = np.random.default_rng(42)n =24*200ar = np.zeros(n)for t inrange(24, n): ar[t] =0.8* ar[t -24] + rng.standard_normal()series = pd.Series(ar)lags = select_pacf_lags(series, n_lags=50, top_k=8)print("selected lags:", lags)assert24in lags, f"lag 24 expected in {lags}"
selected lags: [1, 7, 10, 12, 17, 18, 23, 24]
Degenerate (constant) series — fallback is returned when provided: