task_safe_n_to_1_with_covariates_and_dataframe: Design and Test Logic Explained

A step-by-step walkthrough of the N-to-1 covariate forecasting pipeline and its test suite.

The task_safe_n_to_1_with_covariates_and_dataframe script extends the baseline forecasting pipeline by adding exogenous covariates — weather observations, public holidays, and automatically engineered calendar features. The outcome is a signed weighted aggregation of 11 per-column recursive forecasts into a single combined prediction. Every public parameter is validated, every sensitive value is masked in log output, and every execution path is wrapped in structured error handling.

The test suite in tests/test_task_safe_n_to_1_with_covariates.py decomposes this pipeline into isolated units so that a failure in any stage surfaces as a specific, attributable assertion error rather than a silent mismatch at evaluation time. The classes below follow the logical execution order of the pipeline.

Covariate Data Preparation

Before any model is trained, the pipeline constructs three categories of exogenous features. The TestCovariateDataPreperation class verifies that each category satisfies its structural contract.

Weather data arrives as a DataFrame with a DatetimeIndex and columns for temperature, humidity, and wind speed. The test confirms the exact shape (100, 3) and the presence of all three column names. This is not cosmetic: downstream feature-engineering code selects columns by name, so a missing column produces a KeyError rather than a silently degraded feature set.

Holiday data must be binary — every entry is either 0 (non-holiday) or 1 (holiday). The test calls set(holidays.unique()).issubset({0, 1}) to confirm that no fractional or multi-valued entries have slipped through. Any departure from binary encoding would distort the model’s ability to isolate the holiday effect.

Calendar features — day of week, day of month, month, quarter, weekend flag — are derived directly from the DatetimeIndex. The test verifies that day_of_week spans 0–6, that month spans 1–12, and that the total row count equals 365. These bounds are the minimum necessary to confirm that no date arithmetic has produced impossible values.

Cyclical encoding replaces raw integer months with sine and cosine components so that December and January are numerically close. The test confirms that both components are bounded strictly within [-1, 1], which is the mathematical guarantee of the unit-circle encoding.

import numpy as np
import pandas as pd

dates = pd.date_range("2020-01-01", periods=365, freq="D")
month = dates.month

month_sin = np.sin(2 * np.pi * month / 12)
month_cos = np.cos(2 * np.pi * month / 12)

pd.DataFrame(
    {"month": month[:6], "sin": month_sin[:6].round(4), "cos": month_cos[:6].round(4)},
    index=dates[:6],
)

	month	sin	cos
2020-01-01	1	0.5	0.866
2020-01-02	1	0.5	0.866
2020-01-03	1	0.5	0.866
2020-01-04	1	0.5	0.866
2020-01-05	1	0.5	0.866
2020-01-06	1	0.5	0.866

Exogenous Variable Validation

The TestExogenousVariableValidation class enforces four structural invariants that must hold before the exogenous DataFrame is passed to the forecaster.

The exogenous matrix must have exactly the same number of rows as the target series and an identical index. The test creates both objects on the same DatetimeIndex and checks exog.index.equals(y.index). A mismatch here would cause ForecasterRecursive.fit to raise an alignment error, but verifying it before training saves the cost of constructing lag matrices for an incompatible input.

Missing features are detected by comparing the actual column list against a required set. If a required column is absent, the gap is made explicit rather than letting a downstream KeyError propagate with an opaque stack trace.

NaN handling is tested by confirming that two missing values survive exog.isna().sum().sum() and that forward-fill removes all of them. The ffill() strategy is appropriate for weather and calendar data because the last observed value is the most conservative assumption in the absence of new information.

exog = pd.DataFrame({
    "feat1": [1.0, 2.0, np.nan, 4.0, 5.0],
    "feat2": [10.0, np.nan, 30.0, 40.0, 50.0],
})
print(f"NaNs before ffill: {exog.isna().sum().sum()}")
exog_filled = exog.ffill()
print(f"NaNs after  ffill: {exog_filled.isna().sum().sum()}")

NaNs before ffill: 2
NaNs after  ffill: 0

Logging and Timestamp Format

The TestLoggingForCovariates class tests two aspects of the dual-handler logging system used throughout the covariate pipeline.

The first test verifies that attaching a StreamHandler with a standard formatter results in a logger with at least one handler and level INFO. The N-to-1 pipeline sets this level rather than DEBUG because it is the outermost user-facing task: operators need progress updates, not internal variable traces.

The second test verifies the timestamp format YYYYMMDD_HHMMSS. A 15-character string, an underscore at position 8, and an all-digit date component are the three assertions. This format is used in log file names, so any deviation would produce files that sort incorrectly by creation time in a directory listing — a subtle but consequential problem in audit contexts where log files are reviewed chronologically.

The N-to-1 Forecasting Structure

The TestNto1ForecastingPipeline class establishes the data structure contracts for the recursive forecasting stage.

The basic structure test creates a Series of length 100 + horizon to represent the full available history. The extra horizon rows will become the test set; only the first 100 rows feed the training stage. Constructing the Series with this combined length from the start avoids off-by-one errors when slicing at the train/test boundary.

The recursive forecaster is initialised with LGBMRegressor(n_estimators=100, learning_rate=0.1, random_state=42, verbose=-1). The test confirms that the estimator attributes match the provided values, which establishes that the LGBMRegressor constructor accepted the parameters without silently ignoring any of them. This is important because some sklearn-compatible estimators silently clip or ignore out-of-range parameters.

Multi-output forecasting produces a one-dimensional array of length steps. The test confirms forecast_array.ndim == 1 to distinguish the multi-step output from a two-dimensional matrix that would indicate an accidental multi-target configuration.

Feature Engineering with Covariates

The TestCovariateFeatureEngineering class validates three feature construction patterns used by the pipeline.

Polynomial features are computed by stacking x, x**2, and x**3 into a matrix. The test confirms the shape (5, 3) and verifies that the first column is identical to the original input and the second column equals its square. This establishes that no column reordering or normalisation has been applied, which matters because the aggregation weights are positionally indexed.

Lag features are created with y.shift(i) for i in [1, 2, 3]. The test checks that the first row of lag_1 is NaN, confirming that the shift operation introduces the expected initial missing values rather than wrapping around or filling with zeros.

Rolling window features are computed with y.rolling(window=7).mean(). The test confirms that the first six values are NaN — a direct consequence of requiring a full window before computing the first valid mean. Any model trained without respecting this warm-up period would use NaN-contaminated features for the earliest training rows.

y = pd.Series(np.arange(1, 11, dtype=float))
lags = pd.DataFrame({f"lag_{i}": y.shift(i) for i in range(1, 4)})
rolling_mean = y.rolling(window=7).mean()

print("Lag features (first 5 rows):")
print(lags.head())
print(f"\nRolling mean NaNs in first 6 positions: {rolling_mean.iloc[:6].isna().sum()}")

Lag features (first 5 rows):
   lag_1  lag_2  lag_3
0    NaN    NaN    NaN
1    1.0    NaN    NaN
2    2.0    1.0    NaN
3    3.0    2.0    1.0
4    4.0    3.0    2.0

Rolling mean NaNs in first 6 positions: 6

Integrating Exogenous Variables into the Feature Matrix

The TestExogenousIntegration class tests how exogenous features are merged with lag features to form the complete training matrix.

The feature matrix expansion test creates a base matrix of 5 columns and an exogenous matrix of 3 columns, then combines them with np.column_stack. The result must have exactly 8 columns. This verifies that column stacking does not drop any features and does not introduce duplicates.

The lag-and-exog combination test uses pd.concat([y_lags, exog], axis=1) and confirms the combined column count is 4 (two lags plus two exogenous features). The presence of both y_lag_1 and temp in the column set confirms that the concatenation preserved column names, which ForecasterRecursive uses when calling estimator.fit.

Prediction Aggregation

The TestPredictionAggregation class mirrors the TestAggregatePredict class from the demo task but extends it with temporal index preservation.

The basic aggregation test uses positive fractional weights [0.5, 0.3, 0.2] that sum to 1. Multiplying each column by its weight and summing across columns produces a length-3 Series, confirming that the operation reduces a multi-column DataFrame to a single forecast Series.

The unequal importance test verifies the ordering high_priority > medium_priority > low_priority directly in the weights dictionary. This is a boundary check: the weights must encode a strict priority ranking, and any normalisation that flattened this ranking would silently degrade the combined forecast quality.

The temporal index preservation test reconstructs the aggregated Series with the original DatetimeIndex and calls aggregated_series.index.equals(dates). This confirms that the matrix multiplication idiom predictions.values @ np.array(weights) — which returns a plain np.ndarray — does not silently discard the temporal index when it is re-attached via pd.Series(aggregated, index=dates).

dates = pd.date_range("2020-01-01", periods=10, freq="h")
predictions = pd.DataFrame(
    np.random.randn(10, 3), index=dates, columns=["loc_1", "loc_2", "loc_3"]
)
weights = np.array([0.4, 0.4, 0.2])
aggregated = pd.Series(predictions.values @ weights, index=dates)
print(aggregated.head())

2020-01-01 00:00:00   -0.604312
2020-01-01 01:00:00    0.638938
2020-01-01 02:00:00    1.208190
2020-01-01 03:00:00    0.773594
2020-01-01 04:00:00    0.287739
Freq: h, dtype: float64

Timezone Handling

The TestCovariateTimezone class addresses a recurring source of alignment failures in time series pipelines: mixed timezone-aware and timezone-naive indices.

The first test creates a UTC-indexed Series and confirms that y.index.tz is not None and that the timezone string is "UTC". A None timezone indicates a tz-naive index, which cannot be compared with a tz-aware index without an explicit tz_localize call.

The conversion test uses tz_convert("US/Eastern") to move from UTC to Eastern time and confirms that the result has a non-None timezone. This is relevant when the pipeline is deployed in a timezone other than UTC: the model’s training data and the forecast period must share a consistent timezone, or the lag indices will misalign by a constant offset equal to the UTC offset.

The consistency test creates both y and exog on the same UTC index and verifies str(y.index.tz) == str(exog.index.tz). A mismatch between the target and exogenous timezone would cause ForecasterRecursive.fit to raise an error when it attempts to align the two inputs.

Forced Training vs. Cached Model Loading

The TestForcedTraining class tests the persistence decision logic that controls whether the pipeline trains a new model or loads a previously serialised one.

The force_train flag maps directly to the action string "retrain_model" when True and "load_cached_model" when False. This test formalises the boolean semantics in isolation from the actual file I/O, confirming that the branching logic is correct before it interacts with the filesystem.

The directory creation test calls Path.mkdir(parents=True, exist_ok=True) on a path under /tmp and verifies that the directory exists afterwards. The exist_ok=True flag prevents a FileExistsError on repeated runs, which is the correct behaviour for a pipeline that may be invoked multiple times with the same model directory.

Error Handling

The TestErrorHandlingCovariates class covers three failure modes that are specific to the covariate pipeline.

The missing exog detection test computes the shortfall when only 20 steps of exogenous data are provided for a 24-step forecast horizon. The result horizon - exog_provided == 4 establishes the arithmetic that the validation layer must implement to produce a meaningful error message rather than an implicit out-of-bounds slice.

The misaligned index test creates y with 100 rows and exog with 95 rows, then computes their index intersection. The intersection contains 95 elements, confirming that the safe strategy is to restrict computation to the common index rather than raising an error. This mirrors the dropna/intersection pattern used in the demo task.

The invalid horizon test confirms that -24 is not a member of the accepted horizons [6, 12, 24, 48, 168] and that it is negative. This validates the guard condition that n2n_predict_with_covariates enforces at its entry point.

Kwargs Flexibility

The TestKwargsFlexibility class verifies that the **kwargs mechanism correctly forwards parameters through the pipeline layers.

The estimator kwargs test confirms that a dictionary with n_estimators=500, learning_rate=0.05, and num_leaves=100 retains all three keys and values. The n_to_1_with_covariates function collects these in a forecast_kwargs dictionary before passing them to n2n_predict_with_covariates, so the forwarding chain must preserve each key without mutation.

The forecaster kwargs test checks that lags=[1, 7, 24] and window_size=72 survive the forwarding. Using a list for lags instead of a scalar activates the skforecast multi-lag construction path, so the type must be preserved exactly.

The aggregation kwargs test verifies that method and normalize_weights entries are accessible after construction. These parameters control how agg_predict normalises the weight vector, and their presence in the kwargs dictionary is the precondition for the aggregation stage to respect user-specified aggregation semantics.

Integration: The Complete Pipeline

The TestIntegrationN2N class tests the properties that only emerge when all components operate together.

The end-to-end structure test constructs a 100-row hourly Series named load alongside a two-column exogenous DataFrame with temperature and hour columns on the same index. The three assertions — correct Series length, correct exog shape, and identical indices — are the minimum conditions for a successful call to ForecasterRecursive.fit(y, exog=exog).

The output consistency test simulates the pipeline’s return value: a (24, 11) predictions DataFrame, a 24-element aggregated Series, and a metrics dictionary with MAE and MSE keys. Verifying the shape of the predictions DataFrame confirms that the multi-output forecasting produced the expected number of timesteps and columns before aggregation.

The reproducibility test seeds np.random with 42, draws 10 values, reseeds, and draws again. Exact equality between the two arrays confirms that the seed resets the PRNG state completely. In the pipeline, random_state=42 is passed to LGBMRegressor for the same reason: two runs with identical input must produce identical output, which is a core requirement of the safety-critical design.

seed = 42
np.random.seed(seed)
data_1 = np.random.randn(10)

np.random.seed(seed)
data_2 = np.random.randn(10)

print(f"Arrays identical: {(data_1 == data_2).all()}")
print(f"First 5 values:   {data_1[:5].round(4)}")

Arrays identical: True
First 5 values:   [ 0.4967 -0.1383  0.6477  1.523  -0.2342]

The Complete Execution Flow

A single invocation of task_safe_n_to_1_with_covariates_and_dataframe follows this sequence. The optional logging system is activated first if --logging true is passed. The fetch_data call loads the target time series from the configured data file. Feature engineering then constructs the exogenous matrix: calendar features from the DatetimeIndex, optionally extended with weather windows, holiday indicators, and polynomial interaction terms. The n2n_predict_with_covariates function trains one ForecasterRecursive per target column with the assembled exogenous inputs, serialises each model to the model directory, and returns the prediction DataFrame. Geographic coordinates are redacted from all log output at every level per CWE-312 and CWE-532. The 11-column prediction DataFrame is reduced to a single combined forecast by agg_predict using the DEFAULT_WEIGHTS vector. The combined prediction is printed to stdout and logged to the timestamped log file if logging is enabled.

The test suite mirrors this flow by isolating each stage into a dedicated class, so a regression in any component produces a targeted failure that pinpoints the affected stage without requiring a full pipeline run.

import argparse
import logging
import sys
import warnings
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple, Union

import numpy as np
import pandas as pd

from spotforecast2_safe.data.fetch_data import fetch_data, get_package_data_home
from spotforecast2_safe.manager.logger import setup_logging
from spotforecast2_safe.manager.tools import _parse_bool
from spotforecast2_safe.processing.agg_predict import agg_predict
from spotforecast2_safe.processing.n2n_predict_with_covariates import (
    n2n_predict_with_covariates,
)
from spotforecast2_safe.security.masking import mask_estimator

# Default aggregation weights for the N-to-1 forecasting task.
#
# The position of each value corresponds to a specific forecast component or
# aggregation term used by `agg_predict`. Positive values increase the influence
# of the corresponding component in the final aggregated forecast, whereas
# negative values down-weight or invert the contribution of that component.
#
# NOTE:
# - These defaults are domain-specific and should be updated together with any
#   changes to the aggregation logic or the ordering of components in
#   `agg_predict`.
# - They are defined as a named constant (rather than inline) to make it clear
#   what is being tuned and to avoid unexplained "magic numbers" in the code.
DEFAULT_WEIGHTS: List[float] = [
    1.0,  # Index 0 – weight for forecast component 0
    1.0,  # Index 1 – weight for forecast component 1
    -1.0,  # Index 2 – weight for forecast component 2
    -1.0,  # Index 3 – weight for forecast component 3
    1.0,  # Index 4 – weight for forecast component 4
    -1.0,  # Index 5 – weight for forecast component 5
    1.0,  # Index 6 – weight for forecast component 6
    1.0,  # Index 7 – weight for forecast component 7
    1.0,  # Index 8 – weight for forecast component 8
    -1.0,  # Index 9 – weight for forecast component 9
    1.0,  # Index 10 – weight for forecast component 10
]

warnings.simplefilter("ignore", category=FutureWarning)
warnings.simplefilter("ignore", category=UserWarning)

def n_to_1_with_covariates(
    data: Optional[pd.DataFrame] = None,
    forecast_horizon: int = 24,
    contamination: float = 0.01,
    window_size: int = 72,
    lags: int = 24,
    train_ratio: float = 0.8,
    latitude: float = 51.5136,
    longitude: float = 7.4653,
    timezone: str = "UTC",
    country_code: str = "DE",
    state: str = "NW",
    estimator: Optional[Union[str, object]] = None,
    include_weather_windows: bool = False,
    include_holiday_features: bool = False,
    include_poly_features: bool = False,
    weights: Optional[Union[Dict[str, float], List[float], np.ndarray]] = None,
    verbose: bool = True,
    show_progress: bool = True,
    **kwargs: Any,
) -> Tuple[pd.DataFrame, pd.Series, Dict, Dict]:
    """Execute N-to-1 forecasting pipeline with exogenous covariates.

    This function performs a complete time series forecasting workflow:
    1. Fetches and preprocesses data
    2. Engineers features (calendar, weather, holidays, cyclical, polynomial)
    3. Trains recursive forecaster on multiple targets
    4. Aggregates predictions using weighted combination

    Security Note:
        Geographic coordinates (latitude/longitude) are considered sensitive PII
        (Personally Identifiable Information) per CWE-312 and CWE-532. This function
        implements data masking for all log output to prevent exposure in production
        monitoring systems, log aggregators, or crash dumps. Raw coordinate values
        are never logged at any log level, including DEBUG.

    Args:
        data (Optional[pd.DataFrame]): Optional DataFrame with target time series data.
            If None, fetches data automatically ("data10.csv"). Default: None.

        forecast_horizon (int): Number of forecast steps ahead.
            Determines how many time steps to predict into the future.
            Typical values: 24 (1 day), 48 (2 days), 168 (1 week). Default: 24.

        contamination (float): Outlier contamination level for anomaly detection.
            Expected proportion of outliers in the training data [0, 1].
            Higher values detect fewer outliers. Default: 0.01 (1%).

        window_size (int): Rolling window size for feature engineering (hours).
            Size of the rolling window for computing statistics.
            Must be > lags. Typical range: 24-168. Default: 72.

        lags (int): Number of lagged features to create.
            Creates AR(p) features with p=lags.
            Typical values: 12, 24, 48. Default: 24.

        train_ratio (float): Proportion of data for training [0, 1].
            Remaining data (1 - train_ratio) used for validation/testing.
            Typical values: 0.7-0.9. Default: 0.8.

        latitude (float): Geographic latitude for solar features.
            Used to compute sunrise/sunset times for day/night features.
            Default: 51.5136 (Dortmund, Germany).

        longitude (float): Geographic longitude for solar features.
            Used to compute sunrise/sunset times for day/night features.
            Default: 7.4653 (Dortmund, Germany).

        timezone (str): Timezone for time-based features.
            Any timezone recognized by pytz. Default: "UTC".

        country_code (str): ISO 3166-1 alpha-2 country code for holidays.
            Examples: "DE" (Germany), "US" (USA), "GB" (UK). Default: "DE".

        state (str): State/region code for holidays.
            Country-dependent. For Germany: "BW", "BY", "NW", etc.
            Default: "NW" (Nordrhein-Westfalen).

        estimator (Optional[Union[str, object]]): Forecaster model.
            Can be:
            - None: Uses LGBMRegressor(n_estimators=100, verbose=-1).
            - "ForecasterRecursive": References default estimator (same as None).
            - LGBMRegressor(...): Custom pre-configured estimator.
            - Any sklearn-compatible regressor.
            Default: None.

        include_weather_windows (bool): Add rolling weather statistics.
            Creates moving averages, min, max of weather features over
            multiple windows (1D, 7D). Increases feature count significantly.
            Default: False.

        include_holiday_features (bool): Add holiday binary indicators.
            Creates features indicating holidays and special dates.
            Useful for capturing demand patterns around holidays.
            Default: False.

        include_poly_features (bool): Add polynomial interactions.
            Creates 2nd-order interaction terms between selected features.
            Useful for capturing non-linear relationships.
            Default: False.

        weights (Optional[Union[Dict[str, float], List[float], np.ndarray]]):
            Weights for combining multi-output predictions.
            Can be:
            - None: Uses DEFAULT_WEIGHTS (see module-level constant for values)
            - Dict: {"col_name": weight, ...} for specific columns
            - List: [w1, w2, ...] in column order
            - np.ndarray: Same as list
            Default: None (uses DEFAULT_WEIGHTS).

        verbose (bool): Enable progress logging.
            Prints intermediate results and timestamps.
            Default: True.

        show_progress (bool): Show a progress bar for major pipeline steps.
            Default: True.

        **kwargs (Any): Additional parameters for underlying functions.
            These are passed to n2n_predict_with_covariates().
            Examples:
            - freq: Frequency for data resampling. Default: "h" (hourly).
            - columns: Specific columns to forecast. Default: None (all).
            Any parameter accepted by n2n_predict_with_covariates().

    Returns:
        Tuple[pd.DataFrame, pd.Series, Dict, Dict]: A tuple containing:
            - predictions (pd.DataFrame): Multi-output forecasts from recursive model.
                Each column represents a target variable.
                Index is datetime matching the forecast period.
            - combined_prediction (pd.Series): Aggregated forecast from weighted combination.
                Single column combining all output predictions.
                Index is datetime matching the forecast period.
            - model_metrics (Dict): Performance metrics from recursive forecaster.
                Keys may include: 'mae', 'rmse', 'mape', etc.
            - feature_info (Dict): Information about engineered features.
                Contains feature counts, types, and engineering details.

    Raises:
        ValueError: If forecast_horizon <= 0 or invalid parameter combinations.
        FileNotFoundError: If data source files cannot be accessed.
        RuntimeError: If model training fails or data processing errors occur.

    Examples:
        Basic usage (uses all defaults):

        >>> predictions, combined, metrics, features = n_to_1_with_covariates()
        >>> print(f"Predictions shape: {predictions.shape}")
        >>> print(f"Combined forecast head:\\n{combined.head()}")

        Custom location and forecast horizon:

        >>> predictions, combined, metrics, features = n_to_1_with_covariates(
        ...     forecast_horizon=48,
        ...     latitude=48.1351,
        ...     longitude=11.5820,
        ...     country_code="DE",
        ...     state="BY",
        ...     verbose=True
        ... )

        With feature engineering enabled:

        >>> predictions, combined, metrics, features = n_to_1_with_covariates(
        ...     forecast_horizon=24,
        ...     include_weather_windows=True,
        ...     include_holiday_features=True,
        ...     include_poly_features=True,
        ...     verbose=True
        ... )

        Custom estimator and weights:

        >>> from lightgbm import LGBMRegressor
        >>> custom_estimator = LGBMRegressor(
        ...     n_estimators=200,
        ...     learning_rate=0.01,
        ...     max_depth=7
        ... )
        >>> custom_weights = [1.0, 1.0, -0.5, -0.5]
        >>> predictions, combined, metrics, features = n_to_1_with_covariates(
        ...     forecast_horizon=24,
        ...     estimator=custom_estimator,
        ...     weights=custom_weights,
        ...     verbose=True
        ... )

        With all advanced options:

        >>> predictions, combined, metrics, features = n_to_1_with_covariates(
        ...     forecast_horizon=72,
        ...     contamination=0.02,
        ...     window_size=168,
        ...     lags=48,
        ...     train_ratio=0.75,
        ...     latitude=50.1109,
        ...     longitude=8.6821,
        ...     timezone="Europe/Berlin",
        ...     country_code="DE",
        ...     state="HE",
        ...     include_weather_windows=True,
        ...     include_holiday_features=True,
        ...     include_poly_features=True,
        ...     weights={"power": 1.0, "demand": 0.8},
        ...     verbose=True,
        ...     freq="h",
        ... )
        >>> print(f"Model Metrics: {metrics}")
        >>> print(f"Feature Info: {features}")
    """
    logger = logging.getLogger(__name__)

    # Security: Mask sensitive coordinates immediately (CWE-532, CWE-312)
    # This prevents raw latitude/longitude from being accessed in logging contexts

    masked_estimator = mask_estimator(estimator)

    # Default weights if not provided
    if weights is None:
        # Use documented default aggregation weights instead of inline magic numbers.
        # Use a copy to avoid accidental mutation of the module-level default.
        weights = DEFAULT_WEIGHTS.copy()

    if verbose:
        logger.info("=" * 80)
        logger.info("N-to-1 Forecasting with Exogenous Covariates")
        logger.info("=" * 80)
        logger.info("Configuration:")
        logger.info(f"  Forecast Horizon: {forecast_horizon} steps")
        logger.info(f"  Contamination Level: {contamination}")
        logger.info(f"  Window Size: {window_size}")
        logger.info(f"  Lags: {lags}")
        logger.info(f"  Train Ratio: {train_ratio}")
        # SECURITY: Never log latitude/longitude at all to avoid PII in logs (CWE-312, CWE-532).
        #           Only a fixed redacted placeholder is written to the logs; no masking functions are used.
        logger.info("  Location: [REDACTED]")
        # Log timezone region only, not full timezone (security: CWE-532)
        logger.info(f"  Region: {country_code}-{state}")
        # Log estimator type only, not configuration (security: CWE-532)
        logger.info(f"  Estimator: {masked_estimator}")
        logger.info("  Feature Engineering:")
        logger.info(f"    - Weather Windows: {include_weather_windows}")
        logger.info(f"    - Holiday Features: {include_holiday_features}")
        logger.info(f"    - Polynomial Features: {include_poly_features}")
        logger.info(f"  Weights Type: {type(weights).__name__}")
        logger.info(f"{'=' * 80}")

        # SECURITY CRITICAL (CWE-312, CWE-532): NEVER log the following in clear text:
        # - Raw latitude/longitude values (beyond 1 decimal precision)
        # - Full timezone strings (only log region/country)
        # - Estimator configuration details (only log type)
        # - The forecast_kwargs dictionary (contains unmasked sensitive data)
        # This applies to ALL log levels including DEBUG, and to ALL error messages.
        # Rationale: Prevents PII exposure in log aggregation systems, crash dumps,
        # and production monitoring tools where logs may be retained long-term or
        # accessible to unauthorized parties.

    # --- Step 1: Multi-Output Recursive Forecasting with Covariates ---
    if verbose:
        logger.info("Step 1: Executing multi-output recursive forecasting...")

    # Prepare kwargs for n2n_predict_with_covariates
    # SECURITY: Do NOT log this dict - contains sensitive location data (CWE-532, CWE-312)
    forecast_kwargs = {
        "data": data,
        "forecast_horizon": forecast_horizon,
        "contamination": contamination,
        "window_size": window_size,
        "lags": lags,
        "train_ratio": train_ratio,
        "latitude": latitude,
        "longitude": longitude,
        "timezone": timezone,
        "country_code": country_code,
        "state": state,
        "estimator": estimator,
        "include_weather_windows": include_weather_windows,
        "include_holiday_features": include_holiday_features,
        "include_poly_features": include_poly_features,
        "verbose": verbose,
        "show_progress": show_progress,
    }

    # Add any additional kwargs
    forecast_kwargs.update(kwargs)

    # Execute recursive forecasting
    # SECURITY: Wrapped in try-except to prevent sensitive data exposure in tracebacks
    try:
        predictions, model_metrics, feature_info = n2n_predict_with_covariates(
            **forecast_kwargs
        )
    except Exception as e:
        # SECURITY: Do not log any location data (even masked) to avoid CWE-532/CWE-312
        # Include only non-sensitive context information in the error message.
        logger.error(
            "Forecasting failed during N-to-1 forecasting. Exception type: %s. Estimator: %s",
            type(e).__name__,
            masked_estimator,
        )
        raise

    if verbose:
        logger.info(f"Multi-output predictions shape: {predictions.shape}")
        logger.info(f"Output columns: {list(predictions.columns)}")
        logger.info(f"Date range: {predictions.index[0]} to {predictions.index[-1]}")

    # --- Step 2: Prediction Aggregation ---
    if verbose:
        logger.info("Step 2: Aggregating predictions using weighted combination...")

    combined_prediction = agg_predict(predictions, weights=weights)

    if verbose:
        logger.info(f"Combined prediction shape: {combined_prediction.shape}")
        logger.info("Aggregation Summary:")
        logger.info("  Combined Prediction Head:")
        logger.info(f"\n{combined_prediction.head()}")
        logger.info("  Combined Prediction Statistics:")
        logger.info(f"    Mean: {combined_prediction.mean():.4f}")
        logger.info(f"    Std:  {combined_prediction.std():.4f}")
        logger.info(f"    Min:  {combined_prediction.min():.4f}")
        logger.info(f"    Max:  {combined_prediction.max():.4f}")
        logger.info(f"{'=' * 80}")

    return predictions, combined_prediction, model_metrics, feature_info

Simple Run

predictions, combined, metrics, features = n_to_1_with_covariates()
print(f"Combined forecast head:\\n{combined.head()}")

================================================================================
N2N Recursive Forecasting with Exogenous Covariates
================================================================================

[1/10] Loading and preparing target data...
  Fetching data from CSV...
  Target variables: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']
Data range: 2019-12-01T00:00 to 2021-12-24T21:00
Covariate data range: 2019-12-01T00:00 to 2021-12-25T21:00
The time series data has a valid datetime index that is sorted and complete.
Original data shape: (18118, 11)
Data resampled with rule='h', closed='left', label='left', aggregation='mean'.
Resampled data shape: (18118, 11)

[2/10] Detecting and marking outliers...
Column 'A': Marked 1.0045% of data points as outliers.
Column 'B': Marked 0.9990% of data points as outliers.
Column 'C': Marked 0.0000% of data points as outliers.
Column 'D': Marked 0.9990% of data points as outliers.
Column 'E': Marked 0.9769% of data points as outliers.
Column 'F': Marked 0.9493% of data points as outliers.
Column 'G': Marked 1.0045% of data points as outliers.
Column 'H': Marked 0.9769% of data points as outliers.
Column 'I': Marked 1.0045% of data points as outliers.
Column 'J': Marked 0.9990% of data points as outliers.
Column 'K': Marked 0.9990% of data points as outliers.

[3/10] Processing missing values and creating sample weights...
Number of rows with missing values: 13337
Percentage of rows with missing values: 73.61%
missing_indices: DatetimeIndex(['2019-12-01 00:00:00+00:00', '2019-12-01 01:00:00+00:00',
               '2019-12-01 02:00:00+00:00', '2019-12-01 03:00:00+00:00',
               '2019-12-01 04:00:00+00:00', '2019-12-01 05:00:00+00:00',
               '2019-12-01 06:00:00+00:00', '2019-12-01 07:00:00+00:00',
               '2019-12-01 08:00:00+00:00', '2019-12-01 09:00:00+00:00',
               ...
               '2021-12-18 12:00:00+00:00', '2021-12-20 09:00:00+00:00',
               '2021-12-20 10:00:00+00:00', '2021-12-20 20:00:00+00:00',
               '2021-12-22 01:00:00+00:00', '2021-12-24 01:00:00+00:00',
               '2021-12-24 08:00:00+00:00', '2021-12-24 10:00:00+00:00',
               '2021-12-24 13:00:00+00:00', '2021-12-24 19:00:00+00:00'],
              dtype='datetime64[us, UTC]', name='DateTime', length=13337, freq=None)
Number of rows with missing weights after processing: 17716
Percentage of rows with missing weights after processing: 97.78%

[4/10] Creating exogenous features...
Fetching weather data...
Processing weather features...
Weather features shape: (18142, 105)

[5/10] Combining and encoding exogenous features...

[6/10] Selecting exogenous features...
  Selected 27 exogenous features

[7/10] Merging target and exogenous data...
  Merged data shape: (18118, 38)
  Exogenous prediction shape: (24, 126)

[8/10] Splitting data into train/validation/test...
Splitting data into train/val/test with percentages: 80.0000% / 20.0000% / 0.0000%
Train size: 14494 (80.00%)
Val size: 3624 (20.00%)
Test size: 0 (0.00%)

[9/10] Loading or training recursive forecasters with exogenous variables...

  Training forecaster for A...
    ✓ Forecaster trained for A
  Training forecaster for B...
    ✓ Forecaster trained for B
  Training forecaster for C...
    ✓ Forecaster trained for C
  Training forecaster for D...
    ✓ Forecaster trained for D
  Training forecaster for E...
    ✓ Forecaster trained for E
  Training forecaster for F...
    ✓ Forecaster trained for F
  Training forecaster for G...
    ✓ Forecaster trained for G
  Training forecaster for H...
    ✓ Forecaster trained for H
  Training forecaster for I...
    ✓ Forecaster trained for I
  Training forecaster for J...
    ✓ Forecaster trained for J
  Training forecaster for K...
    ✓ Forecaster trained for K
  Saving 11 trained forecasters to disk...
  ✓ Saved forecaster for A to /home/runner/.spotforecast2_cache/forecasters/forecaster_A.joblib
  ✓ Saved forecaster for B to /home/runner/.spotforecast2_cache/forecasters/forecaster_B.joblib
  ✓ Saved forecaster for C to /home/runner/.spotforecast2_cache/forecasters/forecaster_C.joblib
  ✓ Saved forecaster for D to /home/runner/.spotforecast2_cache/forecasters/forecaster_D.joblib
  ✓ Saved forecaster for E to /home/runner/.spotforecast2_cache/forecasters/forecaster_E.joblib
  ✓ Saved forecaster for F to /home/runner/.spotforecast2_cache/forecasters/forecaster_F.joblib
  ✓ Saved forecaster for G to /home/runner/.spotforecast2_cache/forecasters/forecaster_G.joblib
  ✓ Saved forecaster for H to /home/runner/.spotforecast2_cache/forecasters/forecaster_H.joblib
  ✓ Saved forecaster for I to /home/runner/.spotforecast2_cache/forecasters/forecaster_I.joblib
  ✓ Saved forecaster for J to /home/runner/.spotforecast2_cache/forecasters/forecaster_J.joblib
  ✓ Saved forecaster for K to /home/runner/.spotforecast2_cache/forecasters/forecaster_K.joblib
  ✓ Total forecasters available: 11

[10/10] Generating predictions...

  Predictions shape: (24, 11)

================================================================================
Forecasting completed successfully!
================================================================================
Combined forecast head:\n2021-12-24 22:00:00+00:00    16703.713692
2021-12-24 23:00:00+00:00    16227.748980
2021-12-25 00:00:00+00:00    15264.352523
2021-12-25 01:00:00+00:00    14146.507299
2021-12-25 02:00:00+00:00    12763.192272
Freq: h, dtype: float64