inspection.importance

inspection.importance

Functions

Name Description
generate_imp Generates permutation importances from a RandomForestRegressor.
generate_mdi Generates a DataFrame with Gini importances from a RandomForestRegressor.
plot_feature_importances Generate and plot feature importances using MDI and permutation importance.
plot_feature_scatter_matrix Generate scatter plot matrix for the most important features.
plot_importances Plots the impurity-based and permutation-based feature importances for a given classifier.

generate_imp

inspection.importance.generate_imp(
    X_train,
    X_test,
    y_train,
    y_test,
    random_state=42,
    n_repeats=10,
    use_test=True,
)

Generates permutation importances from a RandomForestRegressor.

Parameters

Name Type Description Default
X_train pd.DataFrame or np.ndarray The training feature set. required
X_test pd.DataFrame or np.ndarray The test feature set. required
y_train pd.Series or np.ndarray The training target variable. required
y_test pd.Series or np.ndarray The test target variable. required
random_state int Random state for the RandomForestRegressor. Defaults to 42. 42
n_repeats int Number of repeats for permutation importance. Defaults to 10. 10
use_test bool If True, computes permutation importance on the test set. If False, uses the training set. Defaults to True. True

Returns

Name Type Description
permutation_importance permutation_importance Permutation importances object.

Examples

>>> from spotoptim.sensitivity.importance import generate_imp
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> X_train, X_test = X[:80], X[80:]
>>> y_train, y_test = y[:80], y[80:]
>>> X_train_df = pd.DataFrame(X_train)
>>> X_test_df = pd.DataFrame(X_test)
>>> y_train_series = pd.Series(y_train)
>>> y_test_series = pd.Series(y_test)
>>> perm_imp = generate_imp(X_train_df, X_test_df, y_train_series, y_test_series)
>>> print(perm_imp)

generate_mdi

inspection.importance.generate_mdi(X, y, feature_names=None, random_state=42)

Generates a DataFrame with Gini importances from a RandomForestRegressor.

Notes

There are two limitations of impurity-based feature importances: - impurity-based importances are biased towards high cardinality features; - impurity-based importances are computed on training set statistics and therefore do not reflect the ability of feature to be useful to make predictions that generalize to the test set. Permutation importances can mitigate the last limitation, because ti can be computed on the test set.

Parameters

Name Type Description Default
X pd.DataFrame or np.ndarray The feature set. required
y pd.Series or np.ndarray The target variable. required
feature_names list List of feature names for labeling. Defaults to None. None
random_state int Random state for the RandomForestRegressor. Defaults to 42. 42

Returns

Name Type Description
pd.DataFrame pd.DataFrame: DataFrame with ‘Feature’ and ‘Importance’ columns.

Examples

>>> from spotoptim.sensitivity.importance import generate_mdi
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> X_df = pd.DataFrame(X)
>>> y_series = pd.Series(y)
>>> result = generate_mdi(X_df, y_series)
>>> print(result)

plot_feature_importances

inspection.importance.plot_feature_importances(
    X,
    y,
    feature_names,
    target_names,
    target_index,
    n_top_features=10,
    figsize=(6, 6),
)

Generate and plot feature importances using MDI and permutation importance.

Parameters

Name Type Description Default
X np.ndarray Input features array required
y np.ndarray Target array required
feature_names list List of feature names required
target_names list List of target names required
target_index int Index of target variable to analyze required
n_top_features int Number of top features to show 10
figsize tuple Size of the figure (6, 6)

Returns

Name Type Description
tuple tuple (top_features, importance_df)

Examples

>>> from spotoptim.sensitivity import plot_feature_importances
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> feature_names = [f"feature_{i}" for i in range(X.shape[1])]
>>> target_names = ["target"]
>>> top_features, imp_df = plot_feature_importances(X, y, feature_names, target_names, target_index=0)
>>> print("Top features:", top_features)

plot_feature_scatter_matrix

inspection.importance.plot_feature_scatter_matrix(
    X,
    y,
    feature_names,
    target_names,
    top_features,
    target_index,
    figsize=(6, 6),
)

Generate scatter plot matrix for the most important features.

Parameters

Name Type Description Default
X np.ndarray Input features array required
y np.ndarray Target array required
feature_names list List of feature names required
target_names list List of target names required
top_features list List of top feature names to include required
target_index int Index of target variable to analyze required
figsize tuple Size of the figure (6, 6)

Returns

Name Type Description
None None

Examples

>>> from spotoptim.sensitivity import plot_feature_scatter_matrix
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> feature_names = [f"feature_{i}" for i in range(X.shape[1])]
>>> target_names = ["target"]
>>> top_features = ["feature_0", "feature_1", "feature_2"]
>>> plot_feature_scatter_matrix(X, y, feature_names, target_names, top_features, target_index=0)

plot_importances

inspection.importance.plot_importances(
    df_mdi,
    perm_imp,
    X_test,
    target_name=None,
    feature_names=None,
    k=10,
    figsize=(12, 8),
    show=True,
)

Plots the impurity-based and permutation-based feature importances for a given classifier.

Parameters

Name Type Description Default
df_mdi pd.DataFrame DataFrame with Gini importances. required
perm_imp object Permutation importances object. required
X_test pd.DataFrame The test feature set for permutation importance. required
target_name str Name of the target variable for labeling. Defaults to None. None
feature_names list List of feature names for labeling. Defaults to None. None
k int Number of top features to display based on importance. Default is 10. 10
figsize tuple Size of the figure (width, height) in inches. Default is (12, 8). (12, 8)
show bool If True, displays the plot immediately. Default is True. True

Returns

Name Type Description
None None

Examples

>>> from spotoptim.sensitivity.importance import generate_mdi, generate_imp, plot_importances
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> X_train, X_test = X[:80], X[80:]
>>> y_train, y_test = y[:80], y[80:]
>>> X_train_df = pd.DataFrame(X_train)
>>> X_test_df = pd.DataFrame(X_test)
>>> y_train_series = pd.Series(y_train)
>>> y_test_series = pd.Series(y_test)
>>> df_mdi = generate_mdi(X_train_df, y_train_series)
>>> perm_imp = generate_imp(X_train_df, X_test_df, y_train_series, y_test_series)
>>> plot_importances(df_mdi, perm_imp, X_test_df, figsize=(15, 10))