Guide for identifying and handling outliers in time series data.
Overview
Outlier detection is crucial for time series forecasting as extreme values can distort model training and predictions. This module provides robust outlier detection and marking capabilities.
Key Functions
API Reference
Full API documentation for mark_outliers and the outlier module is auto-generated in the Preprocessing Reference.
mark_outliers
The mark_outliers function identifies and flags outliers in a time series DataFrame. See Preprocessing Reference for the full signature, parameters, and return values.
Examples
import pandas as pdfrom spotforecast2_safe.preprocessing.outlier import mark_outliers# Create sample time series datadata = pd.DataFrame({'value': [1, 2, 100, 4, 5, 6, 7, 8, 9, 10], # 100 is an outlier})# Mark outliersresult_data, outlier_labels = mark_outliers( data=data.copy(), # Use a copy to preserve original data contamination=0.1, # Expect 10% contamination)outlier_count = (outlier_labels ==-1).sum()print(f"Outliers marked: {outlier_count} records")pd.DataFrame({'value (before detection)': data['value'],'value (after detection)': result_data.squeeze(),'outlier_label': outlier_labels,'is_outlier': outlier_labels ==-1,})
Outliers marked: 1 records
value (before detection)
value (after detection)
outlier_label
is_outlier
0
1
1.0
1
False
1
2
2.0
1
False
2
100
NaN
-1
True
3
4
4.0
1
False
4
5
5.0
1
False
5
6
6.0
1
False
6
7
7.0
1
False
7
8
8.0
1
False
8
9
9.0
1
False
9
10
10.0
1
False
Detection Methods
This module uses isolation forest and other statistical methods to detect: