preprocessing.outlier.manual_outlier_removal(
data,
column,
lower_threshold=None,
upper_threshold=None,
verbose=False,
)
Manual outlier removal function.
Parameters
| data |
pd.DataFrame |
The input dataset. |
required |
| column |
str |
The column name in which to perform manual outlier removal. |
required |
| lower_threshold |
float | None |
The lower threshold below which values are considered outliers. If None, no lower threshold is applied. |
None |
| upper_threshold |
float | None |
The upper threshold above which values are considered outliers. If None, no upper threshold is applied. |
None |
| verbose |
bool |
Whether to print additional information. |
False |
Returns
|
tuple[pd.DataFrame, int] |
tuple[pd.DataFrame, int]: A tuple containing the modified dataset with outliers marked as NaN and the number of outliers marked. |
Examples
import numpy as np
import pandas as pd
from spotforecast2_safe.preprocessing.outlier import manual_outlier_removal
rng = np.random.default_rng(0)
# 20 normal values with two injected boundary violations
values = np.concatenate([rng.uniform(low=100.0, high=600.0, size=20), [10.0, 800.0]])
data = pd.DataFrame({"ABC": values})
cleaned_data, n_outliers = manual_outlier_removal(
data,
column="ABC",
lower_threshold=50,
upper_threshold=700,
verbose=True,
)
print(f"Outliers removed: {n_outliers}")
assert n_outliers >= 2, "Expected the two injected boundary violations to be removed"
assert cleaned_data["ABC"].isna().sum() == n_outliers
Manually marked 2 values > 700 or < 50 as outliers in ABC.
Outliers removed: 2