plots.outlier_plots

plots.outlier_plots

Functions

Name Description
visualize_outliers_hist Visualize outliers in DataFrame using stacked histograms.
visualize_outliers_plotly_scatter Visualize outliers in time series using Plotly scatter plots.

visualize_outliers_hist

plots.outlier_plots.visualize_outliers_hist(
    data,
    data_original,
    columns=None,
    contamination=0.01,
    random_state=1234,
    figsize=(10, 5),
    bins=50,
    **kwargs,
)

Visualize outliers in DataFrame using stacked histograms.

Creates a histogram for each specified column, displaying both regular data and detected outliers in different colors. Uses IsolationForest for outlier detection.

Parameters

Name Type Description Default
data pd.DataFrame The DataFrame with cleaned data (outliers may be NaN). required
data_original pd.DataFrame The original DataFrame before outlier detection. required
columns Optional[list[str]] List of column names to visualize. If None, all columns are used. Default: None. None
contamination float The estimated proportion of outliers in the dataset. Default: 0.01. 0.01
random_state int Random seed for reproducibility. Default: 1234. 1234
figsize tuple[int, int] Figure size as (width, height). Default: (10, 5). (10, 5)
bins int Number of histogram bins. Default: 50. 50
**kwargs Any Additional keyword arguments passed to plt.hist() (e.g., color, alpha, edgecolor, etc.). {}

Returns

Name Type Description
None None. Displays matplotlib figures.

Raises

Name Type Description
ValueError If data or data_original is empty, or if specified columns don’t exist.
ImportError If matplotlib is not installed.

Examples

import matplotlib
matplotlib.use("Agg")  # non-interactive backend for doc rendering
import numpy as np
import pandas as pd
from spotforecast2.plots.outlier_plots import visualize_outliers_hist

rng = np.random.default_rng(0)
normal_vals = rng.normal(loc=20.0, scale=2.0, size=28)
outlier_vals = [60.0, 65.0]  # two obvious outliers
data_original = pd.DataFrame(
    {"temperature": np.concatenate([normal_vals, outlier_vals])}
)
data_cleaned = data_original.copy()

# Renders a stacked histogram; outliers shown in red
visualize_outliers_hist(
    data_cleaned,
    data_original,
    columns=["temperature"],
    contamination=0.07,
    figsize=(6, 3),
    bins=15,
    alpha=0.7,
)
print("visualize_outliers_hist completed without error")
visualize_outliers_hist completed without error

visualize_outliers_plotly_scatter

plots.outlier_plots.visualize_outliers_plotly_scatter(
    data,
    data_original,
    columns=None,
    contamination=0.01,
    random_state=1234,
    **kwargs,
)

Visualize outliers in time series using Plotly scatter plots.

Creates an interactive time series plot for each specified column, showing regular data as a line and detected outliers as scatter points. Uses IsolationForest for outlier detection.

Parameters

Name Type Description Default
data pd.DataFrame The DataFrame with cleaned data (outliers may be NaN). required
data_original pd.DataFrame The original DataFrame before outlier detection. required
columns Optional[list[str]] List of column names to visualize. If None, all columns are used. Default: None. None
contamination float The estimated proportion of outliers in the dataset. Default: 0.01. 0.01
random_state int Random seed for reproducibility. Default: 1234. 1234
**kwargs Any Additional keyword arguments passed to go.Figure.update_layout() (e.g., template, height, etc.). {}

Returns

Name Type Description
None None. Displays Plotly figures.

Raises

Name Type Description
ValueError If data or data_original is empty, or if specified columns don’t exist.
ImportError If plotly is not installed.

Examples

import numpy as np
import pandas as pd
import plotly.graph_objects as go
from spotforecast2_safe.preprocessing.outlier import get_outliers
from spotforecast2.plots.outlier_plots import visualize_outliers_plotly_scatter

rng = np.random.default_rng(0)
dates = pd.date_range("2024-01-01", periods=30, freq="h")
normal_vals = rng.normal(loc=20.0, scale=2.0, size=28)
outlier_vals_arr = [60.0, 65.0]  # two obvious outliers
data_original = pd.DataFrame(
    {"temperature": np.concatenate([normal_vals, outlier_vals_arr])},
    index=dates,
)
data_cleaned = data_original.copy()

# Verify that get_outliers detects the planted outliers before plotting
detected = get_outliers(
    data_original, data_original=data_original, contamination=0.07
)
assert len(detected["temperature"]) >= 1, "Expected at least one outlier"

# Renders an interactive Plotly time series with outliers marked in red
visualize_outliers_plotly_scatter(
    data_cleaned,
    data_original,
    columns=["temperature"],
    contamination=0.07,
)
print(f"Detected {len(detected['temperature'])} outlier(s) in 'temperature'")
Detected 3 outlier(s) in 'temperature'