preprocessing.imputation.apply_imputation

preprocessing.imputation.apply_imputation(
    df_pipeline,
    config,
    logger,
    verbose=False,
)

Apply imputation to a DataFrame based on the method specified in config.

Supports two strategies:

A diagnostic summary (NaN count before and after imputation) is always written to the logger.

Parameters

Name Type Description Default
df_pipeline pd.DataFrame DataFrame to impute. Modified in-place for the "linear" method; a new DataFrame is returned for "weighted" (via :func:get_missing_weights). required
config Any Configuration object that must expose: - imputation_method (str): "weighted" or "linear". - targets (list[str]): column names to interpolate ("linear" method only). - window_size (int): rolling-window size passed to :func:get_missing_weights ("weighted" method only). required
logger logging.Logger Standard-library logger used to emit INFO and WARNING messages. required
verbose bool Whether to print additional information. Defaults to False. False

Returns

Name Type Description
pd.DataFrame tuple[pd.DataFrame, WeightFunction | None]: A two-element tuple:
WeightFunction | None - df_pipeline – imputed DataFrame with no NaN values (when the chosen method can fill all gaps).
tuple[pd.DataFrame, WeightFunction | None] - weight_func – a :class:WeightFunction instance ready to be passed to a forecaster’s weight_func parameter, or None when "linear" imputation is used.

Raises

Name Type Description
ValueError If config.imputation_method is neither "weighted" nor "linear".

Examples

import logging
import pandas as pd
import numpy as np
from types import SimpleNamespace
from spotforecast2_safe.preprocessing.imputation import apply_imputation

# Build a small DataFrame with deliberate gaps
idx = pd.date_range("2024-01-01", periods=10, freq="h")
df = pd.DataFrame(
    {"A": [1.0, 2.0, None, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]},
    index=idx,
)

# Minimal config and stdlib logger
config = SimpleNamespace(
    imputation_method="linear",
    targets=["A"],
    window_size=3,
)
logger = logging.getLogger("demo")

imputed, weight_func = apply_imputation(df, config, logger)
print(imputed["A"].tolist())
print(weight_func)  # None for linear method
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
None