preprocessing.imputation.get_missing_weights

preprocessing.imputation.get_missing_weights(
    data,
    window_size=72,
    verbose=False,
)

Return imputed DataFrame and a series indicating missing weights.

Parameters

Name Type Description Default
data pd.DataFrame The input dataset. required
window_size int The size of the rolling window to consider for missing values. 72
verbose bool Whether to print additional information. Defaults to False. False

Returns

Name Type Description
tuple[pd.DataFrame, pd.Series] Tuple[pd.DataFrame, pd.Series]: A tuple containing the forward and backward filled DataFrame and a numeric series (0.0 or 1.0) where 0.0 indicates a weight for missing values/gaps.

Examples

import numpy as np
import pandas as pd
from spotforecast2_safe.preprocessing.imputation import get_missing_weights

# Synthetic DataFrame with a deliberate two-row NaN gap at positions 3-4
idx = pd.date_range("2024-01-01", periods=10, freq="h")
values = [1.0, 2.0, 3.0, None, None, 6.0, 7.0, 8.0, 9.0, 10.0]
df = pd.DataFrame({"A": values}, index=idx)

filled, weights = get_missing_weights(df, window_size=3, verbose=True)

# No NaNs remain after forward/backward fill
assert filled.isnull().sum().sum() == 0

# Rows inside (and immediately after) the gap receive weight 0
gap_weights = weights.loc[idx[3:5]]
print(gap_weights.tolist())
assert (gap_weights == 0.0).all()

# Rows well before the gap retain weight 1
assert weights.loc[idx[0]] == 1.0
Number of rows with missing values: 2
Percentage of rows with missing values: 20.00%
missing_indices: DatetimeIndex(['2024-01-01 03:00:00', '2024-01-01 04:00:00'], dtype='datetime64[us]', freq='h')
Number of rows with missing weights after processing: 5
Percentage of rows with missing weights after processing: 50.00%
[0.0, 0.0]