preprocessing.imputation.apply_imputation

preprocessing.imputation.apply_imputation(
    df_pipeline,
    config,
    logger,
    verbose=False,
    targets=None,
)

Apply imputation to a DataFrame based on the method specified in config.

Supports three strategies:

"weighted": forward-fill then backward-fill gaps, then build a WeightFunction that down-weights training rows near any gap. Rows inside a gap receive weight 0; the penalty window controls how far the zero-weight zone extends. By default the penalty window is config.window_size; if config.imputation_window_size is set (not None) it overrides window_size for the penalty zone only, so the gap-penalty width can be tuned independently of any rolling feature window.
"linear": apply LinearlyInterpolateTS column-by-column.
"weighted_interp": like "weighted" but uses time-interpolation (via LinearlyInterpolateTS) for the fill step instead of ffill/bfill. Boundary NaNs that interpolation cannot bracket are resolved by ffill/bfill as a final fallback. The pre-fill NaN mask drives the same rolling penalty-window zero-weight construction as "weighted". Intended for use with target_corruption_policy="heal", where physically-impossible dropouts should be bridged smoothly rather than step-filled.

A diagnostic summary (NaN count before and after imputation) is always written to the logger.

Parameters

Name	Type	Description	Default
df_pipeline	pd.DataFrame	DataFrame to impute. Modified in-place for the `"linear"` method; a new DataFrame is returned for `"weighted"` (via `get_missing_weights()`).	required
config	Any	Configuration object that must expose: - `imputation_method` (`str`): `"weighted"` or `"linear"`. - `targets` (`list[str]`): column names to interpolate (`"linear"` method only; also accepted as the explicit `targets` keyword argument which takes precedence). - `window_size` (`int`): rolling-window size passed to `get_missing_weights()` (`"weighted"` method only).	required
logger	logging.Logger	Standard-library logger used to emit `INFO` and `WARNING` messages.	required
verbose	bool	Whether to print additional information. Defaults to False.	`False`

Returns

Name	Type	Description
	pd.DataFrame	tuple[pd.DataFrame, WeightFunction \| None]: A two-element tuple:
	WeightFunction \| None	- df_pipeline – imputed DataFrame with no NaN values (when the chosen method can fill all gaps).
	tuple[pd.DataFrame, WeightFunction \| None]	- weight_func – a `WeightFunction` instance ready to be passed to a forecaster’s `weight_func` parameter, or `None` when `"linear"` imputation is used.

Raises

Name	Type	Description
	ValueError	If `config.imputation_method` is not one of `"weighted"`, `"weighted_interp"`, or `"linear"`.

Examples

import logging
import pandas as pd
import numpy as np
from types import SimpleNamespace
from spotforecast2_safe.preprocessing.imputation import apply_imputation

# Build a small DataFrame with deliberate gaps
idx = pd.date_range("2024-01-01", periods=10, freq="h")
df = pd.DataFrame(
    {"A": [1.0, 2.0, None, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]},
    index=idx,
)

# Minimal config and stdlib logger
config = SimpleNamespace(
    imputation_method="linear",
    targets=["A"],
    window_size=3,
)
logger = logging.getLogger("demo")

imputed, weight_func = apply_imputation(df, config, logger)
print(imputed["A"].tolist())
print(weight_func)  # None for linear method

[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
None