preprocessing.time_series_visualization

preprocessing.time_series_visualization

Time series visualization.

Functions

Name Description
plot_forecast Plot model forecast against actuals and display CV metrics.
plot_predictions Plot actual values against one or more prediction series.
plot_seasonality Plot seasonal patterns (annual, weekly, daily) for a given target.
plot_zoomed_timeseries Plot a time series with a zoomed-in focus area.
visualize_ts_comparison Visualize time series with optional statistical overlays.
visualize_ts_plotly Visualize multiple time series datasets interactively with Plotly.

plot_forecast

preprocessing.time_series_visualization.plot_forecast(
    model,
    X,
    y,
    cv_results=None,
    title='Forecast',
    figsize=None,
    show=True,
    nrows=None,
    ncols=1,
    sharex=True,
)

Plot model forecast against actuals and display CV metrics.

Parameters

Name Type Description Default
model Any Fitted scikit-learn model. required
X pd.DataFrame Feature matrix (e.g., test set). required
y Union[pd.Series, pd.DataFrame] Target series or DataFrame (e.g., test set). required
cv_results Optional[Dict[str, Any]] Optional dictionary of cross-validation results from evaluate() or sklearn.model_selection.cross_validate(). None
title str Title of the plot. Defaults to “Forecast”. 'Forecast'
figsize Optional[tuple] Figure dimensions. None
show bool Whether to display the plot. Defaults to True. True
nrows Optional[int] Number of rows for subplots (multivariate). None
ncols int Number of columns for subplots (multivariate). 1
sharex bool Whether to share x-axis for subplots. Defaults to True. True

Returns

Name Type Description
plt.Figure plt.Figure: The matplotlib Figure object.

Examples

>>> import matplotlib.pyplot as plt
>>> import pandas as pd
>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2.preprocessing.time_series_visualization import plot_forecast
>>> # Create sample data
>>> dates = pd.date_range("2023-01-01", periods=10, freq="D")
>>> X = pd.DataFrame({"feat": np.arange(10)}, index=dates)
>>> y = pd.Series(np.arange(10), index=dates)
>>> model = LinearRegression().fit(X, y)
>>> # Plot forecast
>>> fig = plot_forecast(model, X, y, show=False)
>>> plt.close(fig)

plot_predictions

preprocessing.time_series_visualization.plot_predictions(
    y_true,
    predictions,
    slice_seq=None,
    title='Predictions vs Actuals',
    figsize=None,
    show=True,
    nrows=None,
    ncols=1,
    sharex=True,
)

Plot actual values against one or more prediction series.

Allows visualizing model performance by overlaying predictions on top of actual data. Supports slicing to focus on a specific time range (e.g., the recent test set). Handles both univariate and multivariate targets by creating subplots for multiple targets.

Parameters

Name Type Description Default
y_true Union[pd.Series, pd.DataFrame] Series or DataFrame containing the actual target values. required
predictions Dict[str, Union[pd.Series, pd.DataFrame, np.ndarray]] Dictionary where keys are labels (e.g., model names) and values are the corresponding predictions. If arrays are provided, they must have the same length as the sliced y_true. required
slice_seq Optional[slice] Optional slice object to select a subset of the data. If None, the entire series is plotted. Example: slice(-96, None) to select the last 96 points. None
title str Title of the plot. Defaults to “Predictions vs Actuals”. 'Predictions vs Actuals'
figsize Optional[tuple] Tuple defining figure width and height. If None, automatically calculated based on number of subplots. None
show bool Whether to display the plot. Defaults to True. True
nrows Optional[int] Number of rows for subplots (multivariate). Defaults to n_targets. None
ncols int Number of columns for subplots (multivariate). Defaults to 1. 1
sharex bool Whether to share x-axis for subplots. Defaults to True. True

Returns

Name Type Description
plt.Figure plt.Figure: The matplotlib Figure object containing the plot.

Examples

>>> import matplotlib.pyplot as plt
>>> import pandas as pd
>>> import numpy as np
>>> from spotforecast2.preprocessing.time_series_visualization import plot_predictions
>>> # Create sample data
>>> dates = pd.date_range("2023-01-01", periods=10, freq="D")
>>> y_true = pd.Series(np.arange(10), index=dates, name="Target")
>>> predictions = {"Model A": y_true + 0.5}
>>> # Plot predictions
>>> fig = plot_predictions(y_true, predictions, show=False)
>>> plt.close(fig)

plot_seasonality

preprocessing.time_series_visualization.plot_seasonality(
    data,
    target,
    figsize=(8, 5),
    show=True,
    logscale=False,
)

Plot seasonal patterns (annual, weekly, daily) for a given target.

Creates a 2x2 grid of plots: 1. Distribution by month (boxplot + median). 2. Distribution by week day (boxplot + median). 3. Distribution by hour of day (boxplot + median). 4. Mean target value by day of week and hour.

Parameters

Name Type Description Default
data pd.DataFrame DataFrame containing the time series data. Must have a DatetimeIndex or an index convertible to datetime. required
target str Name of the column to plot. required
figsize tuple[int, int] Figure dimensions (width, height). Defaults to (8, 5). (8, 5)
show bool Whether to display the plot immediately. Defaults to True. True
logscale Union[bool, list[bool]] Whether to use a log scale for the y-axis. Can be a single boolean (applies to all 4 plots) or a list of 4 booleans (applies to each plot individually). Defaults to False. False

Returns

Name Type Description
plt.Figure plt.Figure: The matplotlib Figure object.

Examples

>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> from spotforecast2.preprocessing.time_series_visualization import plot_seasonality
>>> # Create sample data
>>> dates = pd.date_range("2023-01-01", periods=1000, freq="h")
>>> df = pd.DataFrame({"value": range(1, 1001)}, index=dates)
>>> # Plot seasonality with log scale for all plots
>>> fig = plot_seasonality(data=df, target="value", logscale=True, show=False)
>>> plt.close(fig)
>>> # Plot seasonality with log scale for the first plot only
>>> fig = plot_seasonality(
...     data=df,
...     target="value",
...     logscale=[True, False, False, False],
...     show=False
... )
>>> plt.close(fig)

plot_zoomed_timeseries

preprocessing.time_series_visualization.plot_zoomed_timeseries(
    data,
    target,
    zoom,
    title=None,
    figsize=(8, 4),
    show=True,
)

Plot a time series with a zoomed-in focus area.

Creates a two-panel plot: 1. Top panel: Full time series with the zoom area highlighted. 2. Bottom panel: Zoomed-in view of the specified time range.

Parameters

Name Type Description Default
data pd.DataFrame DataFrame containing the time series data. Must have a DatetimeIndex or an index convertible to datetime. required
target str Name of the column to plot. required
zoom tuple[str, str] Tuple of (start_date, end_date) strings defining the zoom range. required
title Optional[str] Optional title for the plot. If None, defaults to target name. None
figsize tuple[int, int] Figure dimensions (width, height). Defaults to (8, 4). (8, 4)
show bool Whether to display the plot immediately. Defaults to True. True

Returns

Name Type Description
plt.Figure plt.Figure: The matplotlib Figure object.

Examples

>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> from spotforecast2.preprocessing.time_series_visualization import plot_zoomed_timeseries
>>> # Create sample data
>>> dates = pd.date_range("2023-01-01", periods=100, freq="h")
>>> df = pd.DataFrame({"value": range(100)}, index=dates)
>>> # Plot with zoom
>>> fig = plot_zoomed_timeseries(
...     data=df,
...     target="value",
...     zoom=("2023-01-02 00:00", "2023-01-03 00:00"),
...     show=False
... )
>>> plt.close(fig)

visualize_ts_comparison

preprocessing.time_series_visualization.visualize_ts_comparison(
    dataframes,
    columns=None,
    title_suffix='',
    figsize=(1000, 500),
    template='plotly_white',
    colors=None,
    show_mean=False,
    **kwargs,
)

Visualize time series with optional statistical overlays.

Similar to visualize_ts_plotly but adds options for statistical overlays like mean values across all datasets.

Parameters

Name Type Description Default
dataframes Dict[str, pd.DataFrame] Dictionary mapping dataset names to pandas DataFrames. required
columns Optional[List[str]] List of column names to visualize. If None, all columns are used. Default: None. None
title_suffix str Suffix to append to column names. Default: ““. ''
figsize tuple[int, int] Figure size as (width, height) in pixels. Default: (1000, 500). (1000, 500)
template str Plotly template. Default: ‘plotly_white’. 'plotly_white'
colors Optional[Dict[str, str]] Dictionary mapping dataset names to colors. Default: None. None
show_mean bool If True, overlay the mean of all datasets. Default: False. False
**kwargs Any Additional keyword arguments for go.Scatter(). {}

Returns

Name Type Description
None None. Displays Plotly figures.

Raises

Name Type Description
ValueError If dataframes is empty.
ImportError If plotly is not installed.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from spotforecast2.preprocessing.time_series_visualization import visualize_ts_comparison
>>>
>>> # Create sample data
>>> np.random.seed(42)
>>> dates1 = pd.date_range('2024-01-01', periods=100, freq='h')
>>> dates2 = pd.date_range('2024-05-11', periods=100, freq='h')
>>>
>>> df1 = pd.DataFrame({
...     'temperature': np.random.normal(20, 5, 100)
... }, index=dates1)
>>>
>>> df2 = pd.DataFrame({
...     'temperature': np.random.normal(22, 5, 100)
... }, index=dates2)
>>>
>>> # Compare with mean overlay
>>> visualize_ts_comparison(
...     {'Dataset1': df1, 'Dataset2': df2},
...     show_mean=True
... )

visualize_ts_plotly

preprocessing.time_series_visualization.visualize_ts_plotly(
    dataframes,
    columns=None,
    title_suffix='',
    figsize=(1000, 500),
    template='plotly_white',
    colors=None,
    **kwargs,
)

Visualize multiple time series datasets interactively with Plotly.

Creates interactive Plotly scatter plots for specified columns across multiple datasets (e.g., train, validation, test splits). Each dataset is displayed as a separate line with a unique color and name in the legend.

Parameters

Name Type Description Default
dataframes Dict[str, pd.DataFrame] Dictionary mapping dataset names to pandas DataFrames with datetime index. Example: {‘Train’: df_train, ‘Validation’: df_val, ‘Test’: df_test} required
columns Optional[List[str]] List of column names to visualize. If None, all columns are used. Default: None. None
title_suffix str Suffix to append to the column name in the title. Useful for adding units or descriptions. Default: ““. ''
figsize tuple[int, int] Figure size as (width, height) in pixels. Default: (1000, 500). (1000, 500)
template str Plotly template name for styling. Options include ‘plotly_white’, ‘plotly_dark’, ‘plotly’, ‘ggplot2’, etc. Default: ‘plotly_white’. 'plotly_white'
colors Optional[Dict[str, str]] Dictionary mapping dataset names to colors. If None, uses Plotly default colors. Example: {‘Train’: ‘blue’, ‘Validation’: ‘orange’}. Default: None. None
**kwargs Any Additional keyword arguments passed to go.Scatter() (e.g., mode=‘lines+markers’, line=dict(dash=‘dash’)). {}

Returns

Name Type Description
None None. Displays Plotly figures.

Raises

Name Type Description
ValueError If dataframes dict is empty, contains no columns, or if specified columns don’t exist in all dataframes.
ImportError If plotly is not installed.
TypeError If dataframes parameter is not a dictionary.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from spotforecast2.preprocessing.time_series_visualization import visualize_ts_plotly
>>>
>>> # Create sample time series data
>>> np.random.seed(42)
>>> dates_train = pd.date_range('2024-01-01', periods=100, freq='h')
>>> dates_val = pd.date_range('2024-05-11', periods=50, freq='h')
>>> dates_test = pd.date_range('2024-07-01', periods=30, freq='h')
>>>
>>> data_train = pd.DataFrame({
...     'temperature': np.random.normal(20, 5, 100),
...     'humidity': np.random.normal(60, 10, 100)
... }, index=dates_train)
>>>
>>> data_val = pd.DataFrame({
...     'temperature': np.random.normal(22, 5, 50),
...     'humidity': np.random.normal(55, 10, 50)
... }, index=dates_val)
>>>
>>> data_test = pd.DataFrame({
...     'temperature': np.random.normal(25, 5, 30),
...     'humidity': np.random.normal(50, 10, 30)
... }, index=dates_test)
>>>
>>> # Visualize all datasets
>>> dataframes = {
...     'Train': data_train,
...     'Validation': data_val,
...     'Test': data_test
... }
>>> visualize_ts_plotly(dataframes)

Single dataset example:

>>> # Visualize single dataset
>>> dataframes = {'Data': data_train}
>>> visualize_ts_plotly(dataframes, columns=['temperature'])

Custom styling:

>>> visualize_ts_plotly(
...     dataframes,
...     columns=['temperature'],
...     template='plotly_dark',
...     colors={'Train': 'blue', 'Validation': 'green', 'Test': 'red'},
...     mode='lines+markers'
... )