utils.validation
utils.validation
Validation utilities for time series forecasting.
This module provides validation functions for time series data and exogenous variables.
Functions
| Name | Description |
|---|---|
| check_exog | Validate that exog is a pandas Series or DataFrame. |
| check_exog_dtypes | Check that exogenous variables have valid data types (int, float, category). |
| check_interval | Validate that a confidence interval specification is valid. |
| check_y | Validate that y is a pandas Series without missing values. |
| get_exog_dtypes | Extract and store the data types of exogenous variables. |
check_exog
utils.validation.check_exog(exog, allow_nan=True, series_id='`exog`')Validate that exog is a pandas Series or DataFrame.
This function ensures that exogenous variables meet basic requirements: - Must be a pandas Series or DataFrame - If Series, must have a name - Optionally warns if NaN values are present
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| exog | Union[pd.Series, pd.DataFrame] | Exogenous variable/s included as predictor/s. | required |
| allow_nan | bool | If True, allows NaN values but issues a warning. If False, raises no warning about NaN values. Defaults to True. | True |
| series_id | str | Identifier of the series used in error messages. Defaults to “exog”. |
'exog' |
Raises
| Name | Type | Description |
|---|---|---|
| TypeError | If exog is not a pandas Series or DataFrame. | |
| ValueError | If exog is a Series without a name. |
Warns
If allow_nan=True and exog contains NaN values.
Examples
>>> import pandas as pd
>>> import numpy as np
>>> from spotforecast2.utils.validation import check_exog
>>>
>>> # Valid DataFrame
>>> exog_df = pd.DataFrame({"temp": [20, 21, 22], "humidity": [50, 55, 60]})
>>> check_exog(exog_df) # No error
>>>
>>> # Valid Series with name
>>> exog_series = pd.Series([1, 2, 3], name="temperature")
>>> check_exog(exog_series) # No error
>>>
>>> # Invalid: Series without name
>>> exog_no_name = pd.Series([1, 2, 3])
>>> try:
... check_exog(exog_no_name)
... except ValueError as e:
... print(f"Error: {e}")
Error: When `exog` is a pandas Series, it must have a name.
>>>
>>> # Invalid: not a Series/DataFrame
>>> try:
... check_exog([1, 2, 3])
... except TypeError as e:
... print(f"Error: {e}")
Error: `exog` must be a pandas Series or DataFrame. Got <class 'list'>.check_exog_dtypes
utils.validation.check_exog_dtypes(
exog,
call_check_exog=True,
series_id='`exog`',
)Check that exogenous variables have valid data types (int, float, category).
This function validates that the exogenous variables (Series or DataFrame) contain only supported data types: integer, float, or category. It issues a warning if other types (like object/string) are found, as these may cause issues with some machine learning estimators.
It also strictly enforces that categorical columns must have integer categories.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| exog | Union[pd.Series, pd.DataFrame] | Exogenous variables to check. | required |
| call_check_exog | bool | If True, calls check_exog() first to ensure basic validity. Defaults to True. | True |
| series_id | str | Identifier used in warning/error messages. Defaults to “exog”. |
'exog' |
Raises
| Name | Type | Description |
|---|---|---|
| TypeError | If categorical columns contain non-integer categories. |
Warns
If columns with unsupported data types (not int, float, category) are found.
Examples
>>> import pandas as pd
>>> import numpy as np
>>> from spotforecast2.utils.validation import check_exog_dtypes
>>>
>>> # Valid types (float, int)
>>> df_valid = pd.DataFrame({
... "a": [1.0, 2.0, 3.0],
... "b": [1, 2, 3]
... })
>>> check_exog_dtypes(df_valid) # No warning
>>>
>>> # Invalid type (object/string)
>>> df_invalid = pd.DataFrame({
... "a": [1, 2, 3],
... "b": ["x", "y", "z"]
... })
>>> check_exog_dtypes(df_invalid)
... # Issues DataTypeWarning about column 'b'
>>>
>>> # Valid categorical (with integer categories)
>>> df_cat = pd.DataFrame({"a": [1, 2, 1]})
>>> df_cat["a"] = df_cat["a"].astype("category")
>>> check_exog_dtypes(df_cat) # No warningcheck_interval
utils.validation.check_interval(
interval=None,
ensure_symmetric_intervals=False,
quantiles=None,
alpha=None,
alpha_literal='alpha',
)Validate that a confidence interval specification is valid.
This function checks that interval values are properly formatted and within valid ranges for confidence interval prediction.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| interval | Union[List[float], Tuple[float], None] | Confidence interval percentiles (0-100 inclusive). Should be [lower_bound, upper_bound]. Example: [2.5, 97.5] for 95% interval. | None |
| ensure_symmetric_intervals | bool | If True, ensure intervals are symmetric (lower + upper = 100). | False |
| quantiles | Union[List[float], Tuple[float], None] | Sequence of quantiles (0-1 inclusive). Currently not validated, reserved for future use. | None |
| alpha | Optional[float] | Confidence level (1-alpha). Currently not validated, reserved for future use. | None |
| alpha_literal | Optional[str] | Name used in error messages for alpha parameter. | 'alpha' |
Raises
| Name | Type | Description |
|---|---|---|
| TypeError | If interval is not a list or tuple. | |
| ValueError | If interval doesn’t have exactly 2 values, values out of range (0-100), lower >= upper, or intervals not symmetric when required. |
Examples
>>> from spotforecast2.utils.validation import check_interval
>>>
>>> # Valid 95% confidence interval
>>> check_interval(interval=[2.5, 97.5]) # No error
>>>
>>> # Valid symmetric interval
>>> check_interval(interval=[2.5, 97.5], ensure_symmetric_intervals=True) # No error
>>>
>>> # Invalid: not symmetric
>>> try:
... check_interval(interval=[5, 90], ensure_symmetric_intervals=True)
... except ValueError as e:
... print("Error: Interval not symmetric")
Error: Interval not symmetric
>>>
>>> # Invalid: wrong number of values
>>> try:
... check_interval(interval=[2.5, 50, 97.5])
... except ValueError as e:
... print("Error: Must have exactly 2 values")
Error: Must have exactly 2 values
>>>
>>> # Invalid: out of range
>>> try:
... check_interval(interval=[-5, 105])
... except ValueError as e:
... print("Error: Values out of range")
Error: Values out of rangecheck_y
utils.validation.check_y(y, series_id='`y`')Validate that y is a pandas Series without missing values.
This function ensures that the input time series meets the basic requirements for forecasting: it must be a pandas Series and must not contain any NaN values.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| y | Any | Time series values to validate. | required |
| series_id | str | Identifier of the series used in error messages. Defaults to “y”. |
'y' |
Raises
| Name | Type | Description |
|---|---|---|
| TypeError | If y is not a pandas Series. | |
| ValueError | If y contains missing (NaN) values. |
Examples
>>> import pandas as pd
>>> import numpy as np
>>> from spotforecast2.utils.validation import check_y
>>>
>>> # Valid series
>>> y = pd.Series([1, 2, 3, 4, 5])
>>> check_y(y) # No error
>>>
>>> # Invalid: not a Series
>>> try:
... check_y([1, 2, 3])
... except TypeError as e:
... print(f"Error: {e}")
Error: `y` must be a pandas Series with a DatetimeIndex or a RangeIndex. Found <class 'list'>.
>>>
>>> # Invalid: contains NaN
>>> y_with_nan = pd.Series([1, 2, np.nan, 4])
>>> try:
... check_y(y_with_nan)
... except ValueError as e:
... print(f"Error: {e}")
Error: `y` has missing values.get_exog_dtypes
utils.validation.get_exog_dtypes(exog)Extract and store the data types of exogenous variables.
This function returns a dictionary mapping column names to their data types. For Series, uses the series name as the key. For DataFrames, uses all column names.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| exog | Union[pd.Series, pd.DataFrame] | Exogenous variable/s (Series or DataFrame). | required |
Returns
| Name | Type | Description |
|---|---|---|
| Dict[str, type] | Dictionary mapping variable names to their pandas dtypes. |
Examples
>>> import pandas as pd
>>> import numpy as np
>>> from spotforecast2.utils.validation import get_exog_dtypes
>>>
>>> # DataFrame with mixed types
>>> exog_df = pd.DataFrame({
... "temp": pd.Series([20.5, 21.3, 22.1], dtype='float64'),
... "day": pd.Series([1, 2, 3], dtype='int64'),
... "is_weekend": pd.Series([False, False, True], dtype='bool')
... })
>>> dtypes = get_exog_dtypes(exog_df)
>>> dtypes['temp']
dtype('float64')
>>> dtypes['day']
dtype('int64')
>>>
>>> # Series
>>> exog_series = pd.Series([1.0, 2.0, 3.0], name="temperature", dtype='float64')
>>> dtypes = get_exog_dtypes(exog_series)
>>> dtypes
{'temperature': dtype('float64')}