inspection.importance.generate_mdi

inspection.importance.generate_mdi(X, y, feature_names=None, random_state=42)

Generates a DataFrame with Gini importances from a RandomForestRegressor.

Notes

There are two limitations of impurity-based feature importances: - impurity-based importances are biased towards high cardinality features; - impurity-based importances are computed on training set statistics and therefore do not reflect the ability of feature to be useful to make predictions that generalize to the test set. Permutation importances can mitigate the last limitation, because ti can be computed on the test set.

Parameters

Name	Type	Description	Default
X	pd.DataFrame or np.ndarray	The feature set.	required
y	pd.Series or np.ndarray	The target variable.	required
feature_names	list	List of feature names for labeling. Defaults to None.	`None`
random_state	int	Random state for the RandomForestRegressor. Defaults to 42.	`42`

Returns

Name	Type	Description
	pd.DataFrame	pd.DataFrame: DataFrame with ‘Feature’ and ‘Importance’ columns.

Examples

>>> from spotoptim.sensitivity.importance import generate_mdi
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)
>>> X_df = pd.DataFrame(X)
>>> y_series = pd.Series(y)
>>> result = generate_mdi(X_df, y_series)
>>> print(result)