preprocessing.target_corruption.apply_target_corruption_policy

preprocessing.target_corruption.apply_target_corruption_policy(
    df,
    *,
    targets,
    policy,
    range_mw,
    step_mw,
    window_days,
    max_heal_hours,
    anchor_zone_hours,
    cutoff,
    logger,
    deviation_mw=None,
    deviation_ref=None,
    deviation_slots=2,
)

Apply the configured corruption policy to the native-cadence frame.

This is the single entry point for the target-corruption sub-pipeline. It runs :func:detect_target_corruption and then dispatches to the policy branch. The function always acts only on target columns; exogenous columns are never modified.

Policy semantics:

Parameters

Name Type Description Default
df pd.DataFrame Native-cadence DataFrame indexed by a DatetimeIndex. required
targets Sequence[str] Target column names. required
policy str One of "abort", "heal", "truncate". required
range_mw Optional[float] Range-rule threshold (MW); None skips that rule. required
step_mw Optional[float] Step-rule threshold (MW); None skips that rule. required
window_days Optional[int] Look-back window for the detector (days); None makes the detector inert. required
max_heal_hours int Maximum number of flagged hours the "heal" policy will accept. 0 effectively disables healing. required
anchor_zone_hours int Hours before cutoff that are protected from healing. Flagged slots inside this zone force a refusal. Default is 168 (one week). required
cutoff Optional[pd.Timestamp] The effective training cutoff timestamp used for the anchor-zone check. None disables the zone check. required
logger logging.Logger Standard-library logger for WARNING/INFO messages. required
deviation_mw Optional[float] Deviation-rule threshold (MW, positive magnitude): flags sustained dropouts target − reference < -deviation_mw. None skips that rule. See detect_target_corruption. None
deviation_ref Optional[str] Reference column name for the deviation rule (e.g. "Forecasted Load"). When enabling this rule, scope targets to the actuals only (e.g. ["Actual Load"]) so that heal/truncate NaN only the actual and the reference survives as an exogenous prior. None
deviation_slots int Minimum consecutive sub-hourly slots for the deviation rule (default 2). 2

Returns

Name Type Description
pd.DataFrame Tuple of (df_out, report) where df_out is either the
TargetCorruptionReport original df object (noop) or a mutated copy (heal/truncate),
Tuple[pd.DataFrame, TargetCorruptionReport] and report is a :class:TargetCorruptionReport.

Raises

Name Type Description
TargetCorruptionError On policy="abort" when flags are found, or on policy="heal" when the heal guard refuses.

Examples

import logging
import pandas as pd
import numpy as np
from spotforecast2_safe.preprocessing.target_corruption import (
    apply_target_corruption_policy,
)

log = logging.getLogger("demo")
idx = pd.date_range("2026-06-03", periods=96, freq="15min", tz="UTC")
vals = [55_000.0] * 96
vals[5] = 44_000.0  # 11 GW step -> flags the 00:00 hour
df = pd.DataFrame({"load": vals}, index=idx)

df_out, report = apply_target_corruption_policy(
    df,
    targets=["load"],
    policy="truncate",
    range_mw=5_000,
    step_mw=8_000,
    window_days=3,
    max_heal_hours=0,
    anchor_zone_hours=168,
    cutoff=None,
    logger=log,
)
assert report.fired
assert report.action == "truncate"
# All target slots from first_flagged_hour onward are NaN
nanned = df_out.loc[report.first_flagged_hour:, "load"]
assert nanned.isna().all()
print("action:", report.action, "flagged hours:", report.n_flagged_hours)
target_corruption[truncate]: corrupt target tail from 2026-06-03 01:00:00+00:00; NaN-ed 1 hour(s), trailing clamp will retract data_end
action: truncate flagged hours: 1