plots.diagnostics

plots.diagnostics

Operational diagnostic plots for energy-load forecasting pipelines.

Ports five matplotlib helpers from the chapter-14 team4 operational script (bart26k-lecture/scripts/team4_4zones_submit.py) into a reusable, stateless API. All functions return a matplotlib.figure.Figure; the caller is responsible for saving and closing it. No plt.show() is called and the backend is never changed here (set matplotlib.use("Agg") before importing pyplot in headless environments).

Functions

Name Description
feature_family Map a feature name to its diagnostic family label.
plot_acf_with_lags Bar chart of autocorrelation values with annotated key lags.
plot_feature_importance_by_family Horizontal bar chart of the top-N feature importances, coloured by family.
plot_forecast_vs_reference Line plot comparing a forecast against an optional reference series.
plot_shap_summary SHAP bar-summary plot for a tree-based estimator.

feature_family

plots.diagnostics.feature_family(name)

Map a feature name to its diagnostic family label.

This is the public, importable version of the family_of helper used inside the chapter-14 team4 operational script. The mapping is intentionally coarse — it covers the feature names that ConfigEntsoe and ForecasterRecursive generate and is used exclusively for colour grouping in plot_feature_importance_by_family.

Parameters

Name Type Description Default
name str Feature name as returned by LGBMRegressor.feature_name_. required

Returns

Name Type Description
str One of: "holiday", "polynomial", "weather_window",
str "cyclical/RBF", "lag", "weather/other".

Examples

from spotforecast2.plots.diagnostics import feature_family

print(feature_family("holiday_DE"))
print(feature_family("brueckentag_NW"))
print(feature_family("poly_hour_2"))
print(feature_family("window_mean_72"))
print(feature_family("sin_hour"))
print(feature_family("lag_1"))
print(feature_family("wind_speed_10m"))
holiday
holiday
polynomial
weather_window
cyclical/RBF
lag
weather/other

plot_acf_with_lags

plots.diagnostics.plot_acf_with_lags(acf, key_lags, conf)

Bar chart of autocorrelation values with annotated key lags.

Ports _plot_acf from the chapter-14 team4 script. The acf DataFrame is the output of spotforecast2.stats.autocorrelation.calculate_lag_autocorrelation and must contain at minimum the columns "lag" and "autocorrelation".

Confidence-band lines at +conf / -conf are drawn as dashed grey horizontal lines. Each lag in key_lags that is present in acf["lag"] gets an orange arrow annotation. Lags not found in the frame are silently skipped.

Parameters

Name Type Description Default
acf pd.DataFrame DataFrame with columns "lag" (int) and "autocorrelation" (float), as returned by calculate_lag_autocorrelation. required
key_lags Sequence[int] Sequence of lag values to annotate (e.g. the PACF-selected lags from the pipeline). required
conf float Half-width of the confidence band, typically 1.96 / sqrt(n_obs). required

Returns

Name Type Description
Figure A matplotlib.figure.Figure.

Examples

import numpy as np
import pandas as pd
from spotforecast2.plots.diagnostics import plot_acf_with_lags

rng = np.random.default_rng(0)
n = 100
acf = pd.DataFrame({"lag": range(n), "autocorrelation": rng.uniform(-0.3, 0.3, n)})
fig = plot_acf_with_lags(acf, key_lags=[1, 24, 48], conf=0.1)
print(type(fig).__name__)
Figure

plot_feature_importance_by_family

plots.diagnostics.plot_feature_importance_by_family(
    names,
    importances,
    *,
    top_n=20,
)

Horizontal bar chart of the top-N feature importances, coloured by family.

Ports _plot_importance from the chapter-14 team4 script. Feature families are determined by feature_family; the colour palette is the same as in the script so diagnostics look identical.

Parameters

Name Type Description Default
names Sequence[str] Feature names (e.g. fc.estimator.feature_name_). required
importances Sequence[float] Corresponding importance scores (e.g. fc.estimator.feature_importances_). required
top_n int Number of top features to display. Defaults to 20. 20

Returns

Name Type Description
Figure A matplotlib.figure.Figure.

Examples

from spotforecast2.plots.diagnostics import plot_feature_importance_by_family

names = ["lag_1", "lag_24", "holiday_DE", "wind_speed_10m",
         "poly_hour_2", "window_mean_72", "sin_hour"]
scores = [100, 80, 60, 55, 40, 35, 20]
fig = plot_feature_importance_by_family(names, scores, top_n=5)
print(type(fig).__name__)
Figure

plot_forecast_vs_reference

plots.diagnostics.plot_forecast_vs_reference(
    forecast,
    reference,
    *,
    forecast_label='forecast',
    reference_label='reference',
    unit_scale=0.001,
    unit='GW',
)

Line plot comparing a forecast against an optional reference series.

Ports _plot_vs_entsoe from the chapter-14 team4 script into a general, label-parametrised form. The reference is reindexed to forecast.index; only the overlapping (non-NaN) timestamps are plotted. If there is no overlap the reference line is omitted and an INFO message is logged — the function still returns a valid single-line figure rather than raising.

Both series are scaled by unit_scale before plotting (default 1e-3 converts MW to GW).

The overlap MAD (mean absolute deviation between forecast and reference over shared timestamps) is logged at INFO level when an overlap exists. This mirrors the original script’s behaviour and is useful for post-hoc sanity checks in operator logs.

Parameters

Name Type Description Default
forecast pd.Series Point forecast series with a DatetimeIndex. required
reference pd.Series Reference series (e.g. ENTSO-E day-ahead forecast). Will be reindexed to forecast.index; NaN rows after reindexing are treated as “no overlap” for that timestamp. required
forecast_label str Legend label for the forecast line. 'forecast'
reference_label str Legend label for the reference line. 'reference'
unit_scale float Multiplicative scale applied to both series before plotting. Defaults to 1e-3 (MW → GW). 0.001
unit str Unit string used in the y-axis label. 'GW'

Returns

Name Type Description
Figure A matplotlib.figure.Figure.

Examples

import numpy as np
import pandas as pd
from spotforecast2.plots.diagnostics import plot_forecast_vs_reference

idx = pd.date_range("2024-01-15", periods=24, freq="h", tz="UTC")
rng = np.random.default_rng(42)
forecast = pd.Series(40_000 + rng.standard_normal(24) * 1000, index=idx)
reference = pd.Series(41_000 + rng.standard_normal(24) * 800, index=idx)

fig = plot_forecast_vs_reference(
    forecast, reference,
    forecast_label="team_4 forecast",
    reference_label="ENTSO-E day-ahead",
)
print(type(fig).__name__)
Figure

plot_shap_summary

plots.diagnostics.plot_shap_summary(estimator, X, *, max_samples=2000)

SHAP bar-summary plot for a tree-based estimator.

Ports _plot_shap from the chapter-14 team4 script. X is subsampled to approximately max_samples rows (uniform stride len(X) // max_samples; lengths just above max_samples are passed in full) before computing SHAP values so the call stays fast even for large training matrices.

The function uses shap.TreeExplainer and shap.summary_plot(plot_type="bar", show=False), then captures the current matplotlib figure via plt.gcf(). Because the figure is harvested from the global pyplot state this function is not thread-safe. Callers must close the returned figure (e.g. plt.close(fig)) before performing other pyplot work.

Parameters

Name Type Description Default
estimator object A fitted tree-based estimator supported by shap.TreeExplainer (e.g. LGBMRegressor). required
X pd.DataFrame Feature matrix; typically the design matrix returned by ForecasterRecursive.create_train_X_y. required
max_samples int Row budget passed to the SHAP explainer. Defaults to 2000. 2000

Returns

Name Type Description
Figure A matplotlib.figure.Figure.

Examples

import numpy as np
import pandas as pd
from lightgbm import LGBMRegressor
from spotforecast2.plots.diagnostics import plot_shap_summary

rng = np.random.default_rng(0)
X = pd.DataFrame(rng.standard_normal((200, 5)),
                 columns=[f"f{i}" for i in range(5)])
y = X["f0"] + rng.standard_normal(200) * 0.1
est = LGBMRegressor(n_estimators=20, verbose=-1)
est.fit(X, y)
fig = plot_shap_summary(est, X, max_samples=100)
print(type(fig).__name__)