inspection.importance.generate_mdi
inspection.importance.generate_mdi(X, y, feature_names=None, random_state=42)Generates a DataFrame with Gini importances from a RandomForestRegressor.
Notes
There are two limitations of impurity-based feature importances: - impurity-based importances are biased towards high cardinality features; - impurity-based importances are computed on training set statistics and therefore do not reflect the ability of feature to be useful to make predictions that generalize to the test set. Permutation importances can mitigate the last limitation, because ti can be computed on the test set.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| X | pd.DataFrame or np.ndarray | The feature set. | required |
| y | pd.Series or np.ndarray | The target variable. | required |
| feature_names | list | List of feature names for labeling. Defaults to None. | None |
| random_state | int | Random state for the RandomForestRegressor. Defaults to 42. | 42 |
Returns
| Name | Type | Description |
|---|---|---|
| pd.DataFrame | pd.DataFrame: DataFrame with ‘Feature’ and ‘Importance’ columns. |
Examples
>>> from spotoptim.sensitivity.importance import generate_mdi
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> X_df = pd.DataFrame(X)
>>> y_series = pd.Series(y)
>>> result = generate_mdi(X_df, y_series)
>>> print(result)