inspection.importance
inspection.importance
Functions
| Name | Description |
|---|---|
| generate_imp | Generates permutation importances from a RandomForestRegressor. |
| generate_mdi | Generates a DataFrame with Gini importances from a RandomForestRegressor. |
| plot_feature_importances | Generate and plot feature importances using MDI and permutation importance. |
| plot_feature_scatter_matrix | Generate scatter plot matrix for the most important features. |
| plot_importances | Plots the impurity-based and permutation-based feature importances for a given classifier. |
generate_imp
inspection.importance.generate_imp(
X_train,
X_test,
y_train,
y_test,
random_state=42,
n_repeats=10,
use_test=True,
)Generates permutation importances from a RandomForestRegressor.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| X_train | pd.DataFrame or np.ndarray | The training feature set. | required |
| X_test | pd.DataFrame or np.ndarray | The test feature set. | required |
| y_train | pd.Series or np.ndarray | The training target variable. | required |
| y_test | pd.Series or np.ndarray | The test target variable. | required |
| random_state | int | Random state for the RandomForestRegressor. Defaults to 42. | 42 |
| n_repeats | int | Number of repeats for permutation importance. Defaults to 10. | 10 |
| use_test | bool | If True, computes permutation importance on the test set. If False, uses the training set. Defaults to True. | True |
Returns
| Name | Type | Description |
|---|---|---|
| permutation_importance | permutation_importance |
Permutation importances object. |
Examples
>>> from spotoptim.sensitivity.importance import generate_imp
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> X_train, X_test = X[:80], X[80:]
>>> y_train, y_test = y[:80], y[80:]
>>> X_train_df = pd.DataFrame(X_train)
>>> X_test_df = pd.DataFrame(X_test)
>>> y_train_series = pd.Series(y_train)
>>> y_test_series = pd.Series(y_test)
>>> perm_imp = generate_imp(X_train_df, X_test_df, y_train_series, y_test_series)
>>> print(perm_imp)generate_mdi
inspection.importance.generate_mdi(X, y, feature_names=None, random_state=42)Generates a DataFrame with Gini importances from a RandomForestRegressor.
Notes
There are two limitations of impurity-based feature importances: - impurity-based importances are biased towards high cardinality features; - impurity-based importances are computed on training set statistics and therefore do not reflect the ability of feature to be useful to make predictions that generalize to the test set. Permutation importances can mitigate the last limitation, because ti can be computed on the test set.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| X | pd.DataFrame or np.ndarray | The feature set. | required |
| y | pd.Series or np.ndarray | The target variable. | required |
| feature_names | list | List of feature names for labeling. Defaults to None. | None |
| random_state | int | Random state for the RandomForestRegressor. Defaults to 42. | 42 |
Returns
| Name | Type | Description |
|---|---|---|
| pd.DataFrame | pd.DataFrame: DataFrame with ‘Feature’ and ‘Importance’ columns. |
Examples
>>> from spotoptim.sensitivity.importance import generate_mdi
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> X_df = pd.DataFrame(X)
>>> y_series = pd.Series(y)
>>> result = generate_mdi(X_df, y_series)
>>> print(result)plot_feature_importances
inspection.importance.plot_feature_importances(
X,
y,
feature_names,
target_names,
target_index,
n_top_features=10,
figsize=(6, 6),
)Generate and plot feature importances using MDI and permutation importance.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| X | np.ndarray | Input features array | required |
| y | np.ndarray | Target array | required |
| feature_names | list | List of feature names | required |
| target_names | list | List of target names | required |
| target_index | int | Index of target variable to analyze | required |
| n_top_features | int | Number of top features to show | 10 |
| figsize | tuple | Size of the figure | (6, 6) |
Returns
| Name | Type | Description |
|---|---|---|
| tuple | tuple | (top_features, importance_df) |
Examples
>>> from spotoptim.sensitivity import plot_feature_importances
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> feature_names = [f"feature_{i}" for i in range(X.shape[1])]
>>> target_names = ["target"]
>>> top_features, imp_df = plot_feature_importances(X, y, feature_names, target_names, target_index=0)
>>> print("Top features:", top_features)plot_feature_scatter_matrix
inspection.importance.plot_feature_scatter_matrix(
X,
y,
feature_names,
target_names,
top_features,
target_index,
figsize=(6, 6),
)Generate scatter plot matrix for the most important features.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| X | np.ndarray | Input features array | required |
| y | np.ndarray | Target array | required |
| feature_names | list | List of feature names | required |
| target_names | list | List of target names | required |
| top_features | list | List of top feature names to include | required |
| target_index | int | Index of target variable to analyze | required |
| figsize | tuple | Size of the figure | (6, 6) |
Returns
| Name | Type | Description |
|---|---|---|
| None | None |
Examples
>>> from spotoptim.sensitivity import plot_feature_scatter_matrix
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> feature_names = [f"feature_{i}" for i in range(X.shape[1])]
>>> target_names = ["target"]
>>> top_features = ["feature_0", "feature_1", "feature_2"]
>>> plot_feature_scatter_matrix(X, y, feature_names, target_names, top_features, target_index=0)plot_importances
inspection.importance.plot_importances(
df_mdi,
perm_imp,
X_test,
target_name=None,
feature_names=None,
k=10,
figsize=(12, 8),
show=True,
)Plots the impurity-based and permutation-based feature importances for a given classifier.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| df_mdi | pd.DataFrame | DataFrame with Gini importances. | required |
| perm_imp | object | Permutation importances object. | required |
| X_test | pd.DataFrame | The test feature set for permutation importance. | required |
| target_name | str | Name of the target variable for labeling. Defaults to None. | None |
| feature_names | list | List of feature names for labeling. Defaults to None. | None |
| k | int | Number of top features to display based on importance. Default is 10. | 10 |
| figsize | tuple | Size of the figure (width, height) in inches. Default is (12, 8). | (12, 8) |
| show | bool | If True, displays the plot immediately. Default is True. | True |
Returns
| Name | Type | Description |
|---|---|---|
| None | None |
Examples
>>> from spotoptim.sensitivity.importance import generate_mdi, generate_imp, plot_importances
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> X_train, X_test = X[:80], X[80:]
>>> y_train, y_test = y[:80], y[80:]
>>> X_train_df = pd.DataFrame(X_train)
>>> X_test_df = pd.DataFrame(X_test)
>>> y_train_series = pd.Series(y_train)
>>> y_test_series = pd.Series(y_test)
>>> df_mdi = generate_mdi(X_train_df, y_train_series)
>>> perm_imp = generate_imp(X_train_df, X_test_df, y_train_series, y_test_series)
>>> plot_importances(df_mdi, perm_imp, X_test_df, figsize=(15, 10))