tuple[pd.DataFrame, np.ndarray]: A tuple containing the modified dataset with outliers marked as NaN and the outlier labels.
Examples
import numpy as npimport pandas as pdfrom spotforecast2_safe.preprocessing.outlier import mark_outliersrng = np.random.default_rng(0)# 50 normal values plus two clear outliers (1000, -1000)values = np.concatenate([rng.normal(loc=10.0, scale=1.0, size=50), [1000.0, -1000.0]])data = pd.DataFrame({"load": values})cleaned_data, outlier_labels = mark_outliers( data, contamination=0.05, random_state=42, verbose=True)n_nan = cleaned_data["load"].isna().sum()print(f"Outliers marked as NaN: {n_nan}")assert n_nan >=2, "Expected at least the two injected extreme outliers to be marked"
Column 'load': Marked 5.7692% of data points as outliers.
Outliers marked as NaN: 3