preprocessing.exog_providers.CovidInfectionRateProvider

preprocessing.exog_providers.CovidInfectionRateProvider(
    data_home=None,
    csv_path=None,
    column='covid_infection_rate',
    fill_outside=0.0,
    max_gap=0,
    max_tail_gap=0,
    provider_window=None,
)

German national COVID-19 7-day incidence as an exogenous level regressor.

Reads the bundled, static RKI series (datasets/csv/covid_infection_rate_de.csv, CC-BY-4.0) and broadcasts the daily national 7-day incidence (per 100,000) onto the hourly target index: forward-filled within the data’s date span and filled with fill_outside (0.0 by default) before the first and after the last observed day, since outside the pandemic window there is no signal.

This is a slow socio-economic level input (a lockdown-stringency proxy). It carries the CR-3 release-lag caveat: the bundled file is the final published vintage, whereas on a true live path only the latest, lagged vintage is available. For training over historical data this is the standard treatment.

Parameters

Name Type Description Default
data_home DataHome Unused (kept for a uniform provider signature); the dataset is package data, located via get_package_data_home(). None
csv_path Optional[Union[str, Path]] Optional explicit path to the COVID CSV, overriding the bundled location. None
column str Output column name. Defaults to "covid_infection_rate". 'covid_infection_rate'
fill_outside float Value used outside the observed date span. Defaults to 0.0. 0.0
max_gap int Maximum contiguous missing-value run healed by _align_to_index. See :func:_align_to_index for full semantics. Defaults to 0. 0
max_tail_gap int Extended healing budget for the trailing-edge NaN run (the run containing index[-1]). The effective tail budget is max(max_gap, max_tail_gap). See :func:_align_to_index. Defaults to 0. 0
provider_window Optional[pd.DatetimeIndex] Validation index passed to _align_to_index as validate_index. See :func:_align_to_index. Defaults to None. None

Examples

import pandas as pd
from spotforecast2_safe.preprocessing.exog_providers import (
    CovidInfectionRateProvider,
)

idx = pd.date_range("2021-12-01", periods=24, freq="h", tz="UTC")
out = CovidInfectionRateProvider().build(idx)
print(out.columns.tolist(), out.shape, bool(out.isna().any().any()))
['covid_infection_rate'] (24, 1) False

Methods

Name Description
build Return a single-column float32 frame with the COVID incidence.

build

preprocessing.exog_providers.CovidInfectionRateProvider.build(index)

Return a single-column float32 frame with the COVID incidence.

Parameters

Name Type Description Default
index pd.DatetimeIndex Hourly DatetimeIndex (tz-aware UTC) covering the training-plus-forecast window. required

Returns

Name Type Description
pd.DataFrame pd.DataFrame: One column (covid_infection_rate by default), float32, indexed exactly by index. Values outside the pandemic date range are filled with fill_outside (0.0).

Raises

Name Type Description
ExogProviderError If the bundled CSV is absent or malformed.

Examples

import pandas as pd
from spotforecast2_safe.preprocessing.exog_providers import (
    CovidInfectionRateProvider,
)

idx = pd.date_range("2021-12-01", periods=24, freq="h", tz="UTC")
provider = CovidInfectionRateProvider()
out = provider.build(idx)
print(out.columns.tolist(), out.shape, out.dtypes.iloc[0].name)
assert out.shape == (24, 1)
assert not out.isna().any().any()
['covid_infection_rate'] (24, 1) float32