utils.validation

utils.validation

Validation utilities for time series forecasting.

This module provides validation functions for time series data and exogenous variables.

Functions

Name Description
check_exog Validate that exog is a pandas Series or DataFrame.
check_exog_dtypes Check that exogenous variables have valid data types (int, float, category).
check_interval Validate that a confidence interval specification is valid.
check_y Validate that y is a pandas Series without missing values.
get_exog_dtypes Extract and store the data types of exogenous variables.

check_exog

utils.validation.check_exog(exog, allow_nan=True, series_id='`exog`')

Validate that exog is a pandas Series or DataFrame.

This function ensures that exogenous variables meet basic requirements: - Must be a pandas Series or DataFrame - If Series, must have a name - Optionally warns if NaN values are present

Parameters

Name Type Description Default
exog Union[pd.Series, pd.DataFrame] Exogenous variable/s included as predictor/s. required
allow_nan bool If True, allows NaN values but issues a warning. If False, raises no warning about NaN values. Defaults to True. True
series_id str Identifier of the series used in error messages. Defaults to “exog”. 'exog'

Raises

Name Type Description
TypeError If exog is not a pandas Series or DataFrame.
ValueError If exog is a Series without a name.

Warns

If allow_nan=True and exog contains NaN values.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from spotforecast2.utils.validation import check_exog
>>>
>>> # Valid DataFrame
>>> exog_df = pd.DataFrame({"temp": [20, 21, 22], "humidity": [50, 55, 60]})
>>> check_exog(exog_df)  # No error
>>>
>>> # Valid Series with name
>>> exog_series = pd.Series([1, 2, 3], name="temperature")
>>> check_exog(exog_series)  # No error
>>>
>>> # Invalid: Series without name
>>> exog_no_name = pd.Series([1, 2, 3])
>>> try:
...     check_exog(exog_no_name)
... except ValueError as e:
...     print(f"Error: {e}")
Error: When `exog` is a pandas Series, it must have a name.
>>>
>>> # Invalid: not a Series/DataFrame
>>> try:
...     check_exog([1, 2, 3])
... except TypeError as e:
...     print(f"Error: {e}")
Error: `exog` must be a pandas Series or DataFrame. Got <class 'list'>.

check_exog_dtypes

utils.validation.check_exog_dtypes(
    exog,
    call_check_exog=True,
    series_id='`exog`',
)

Check that exogenous variables have valid data types (int, float, category).

This function validates that the exogenous variables (Series or DataFrame) contain only supported data types: integer, float, or category. It issues a warning if other types (like object/string) are found, as these may cause issues with some machine learning estimators.

It also strictly enforces that categorical columns must have integer categories.

Parameters

Name Type Description Default
exog Union[pd.Series, pd.DataFrame] Exogenous variables to check. required
call_check_exog bool If True, calls check_exog() first to ensure basic validity. Defaults to True. True
series_id str Identifier used in warning/error messages. Defaults to “exog”. 'exog'

Raises

Name Type Description
TypeError If categorical columns contain non-integer categories.

Warns

If columns with unsupported data types (not int, float, category) are found.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from spotforecast2.utils.validation import check_exog_dtypes
>>>
>>> # Valid types (float, int)
>>> df_valid = pd.DataFrame({
...     "a": [1.0, 2.0, 3.0],
...     "b": [1, 2, 3]
... })
>>> check_exog_dtypes(df_valid)  # No warning
>>>
>>> # Invalid type (object/string)
>>> df_invalid = pd.DataFrame({
...     "a": [1, 2, 3],
...     "b": ["x", "y", "z"]
... })
>>> check_exog_dtypes(df_invalid)
... # Issues DataTypeWarning about column 'b'
>>>
>>> # Valid categorical (with integer categories)
>>> df_cat = pd.DataFrame({"a": [1, 2, 1]})
>>> df_cat["a"] = df_cat["a"].astype("category")
>>> check_exog_dtypes(df_cat)  # No warning

check_interval

utils.validation.check_interval(
    interval=None,
    ensure_symmetric_intervals=False,
    quantiles=None,
    alpha=None,
    alpha_literal='alpha',
)

Validate that a confidence interval specification is valid.

This function checks that interval values are properly formatted and within valid ranges for confidence interval prediction.

Parameters

Name Type Description Default
interval Union[List[float], Tuple[float], None] Confidence interval percentiles (0-100 inclusive). Should be [lower_bound, upper_bound]. Example: [2.5, 97.5] for 95% interval. None
ensure_symmetric_intervals bool If True, ensure intervals are symmetric (lower + upper = 100). False
quantiles Union[List[float], Tuple[float], None] Sequence of quantiles (0-1 inclusive). Currently not validated, reserved for future use. None
alpha Optional[float] Confidence level (1-alpha). Currently not validated, reserved for future use. None
alpha_literal Optional[str] Name used in error messages for alpha parameter. 'alpha'

Raises

Name Type Description
TypeError If interval is not a list or tuple.
ValueError If interval doesn’t have exactly 2 values, values out of range (0-100), lower >= upper, or intervals not symmetric when required.

Examples

>>> from spotforecast2.utils.validation import check_interval
>>>
>>> # Valid 95% confidence interval
>>> check_interval(interval=[2.5, 97.5])  # No error
>>>
>>> # Valid symmetric interval
>>> check_interval(interval=[2.5, 97.5], ensure_symmetric_intervals=True)  # No error
>>>
>>> # Invalid: not symmetric
>>> try:
...     check_interval(interval=[5, 90], ensure_symmetric_intervals=True)
... except ValueError as e:
...     print("Error: Interval not symmetric")
Error: Interval not symmetric
>>>
>>> # Invalid: wrong number of values
>>> try:
...     check_interval(interval=[2.5, 50, 97.5])
... except ValueError as e:
...     print("Error: Must have exactly 2 values")
Error: Must have exactly 2 values
>>>
>>> # Invalid: out of range
>>> try:
...     check_interval(interval=[-5, 105])
... except ValueError as e:
...     print("Error: Values out of range")
Error: Values out of range

check_y

utils.validation.check_y(y, series_id='`y`')

Validate that y is a pandas Series without missing values.

This function ensures that the input time series meets the basic requirements for forecasting: it must be a pandas Series and must not contain any NaN values.

Parameters

Name Type Description Default
y Any Time series values to validate. required
series_id str Identifier of the series used in error messages. Defaults to “y”. 'y'

Raises

Name Type Description
TypeError If y is not a pandas Series.
ValueError If y contains missing (NaN) values.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from spotforecast2.utils.validation import check_y
>>>
>>> # Valid series
>>> y = pd.Series([1, 2, 3, 4, 5])
>>> check_y(y)  # No error
>>>
>>> # Invalid: not a Series
>>> try:
...     check_y([1, 2, 3])
... except TypeError as e:
...     print(f"Error: {e}")
Error: `y` must be a pandas Series with a DatetimeIndex or a RangeIndex. Found <class 'list'>.
>>>
>>> # Invalid: contains NaN
>>> y_with_nan = pd.Series([1, 2, np.nan, 4])
>>> try:
...     check_y(y_with_nan)
... except ValueError as e:
...     print(f"Error: {e}")
Error: `y` has missing values.

get_exog_dtypes

utils.validation.get_exog_dtypes(exog)

Extract and store the data types of exogenous variables.

This function returns a dictionary mapping column names to their data types. For Series, uses the series name as the key. For DataFrames, uses all column names.

Parameters

Name Type Description Default
exog Union[pd.Series, pd.DataFrame] Exogenous variable/s (Series or DataFrame). required

Returns

Name Type Description
Dict[str, type] Dictionary mapping variable names to their pandas dtypes.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from spotforecast2.utils.validation import get_exog_dtypes
>>>
>>> # DataFrame with mixed types
>>> exog_df = pd.DataFrame({
...     "temp": pd.Series([20.5, 21.3, 22.1], dtype='float64'),
...     "day": pd.Series([1, 2, 3], dtype='int64'),
...     "is_weekend": pd.Series([False, False, True], dtype='bool')
... })
>>> dtypes = get_exog_dtypes(exog_df)
>>> dtypes['temp']
dtype('float64')
>>> dtypes['day']
dtype('int64')
>>>
>>> # Series
>>> exog_series = pd.Series([1.0, 2.0, 3.0], name="temperature", dtype='float64')
>>> dtypes = get_exog_dtypes(exog_series)
>>> dtypes
{'temperature': dtype('float64')}