inspection.importance.generate_mdi

inspection.importance.generate_mdi(X, y, feature_names=None, random_state=42)

Generates a DataFrame with Gini importances from a RandomForestRegressor.

Notes

There are two limitations of impurity-based feature importances: - impurity-based importances are biased towards high cardinality features; - impurity-based importances are computed on training set statistics and therefore do not reflect the ability of feature to be useful to make predictions that generalize to the test set. Permutation importances can mitigate the last limitation, because ti can be computed on the test set.

Parameters

Name Type Description Default
X pd.DataFrame or np.ndarray The feature set. required
y pd.Series or np.ndarray The target variable. required
feature_names list List of feature names for labeling. Defaults to None. None
random_state int Random state for the RandomForestRegressor. Defaults to 42. 42

Returns

Name Type Description
pd.DataFrame pd.DataFrame: DataFrame with ‘Feature’ and ‘Importance’ columns.

Examples

>>> from spotoptim.sensitivity.importance import generate_mdi
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> X_df = pd.DataFrame(X)
>>> y_series = pd.Series(y)
>>> result = generate_mdi(X_df, y_series)
>>> print(result)