utils.pca.get_pca_topk

utils.pca.get_pca_topk(pca, feature_names, k=10)

Identify the top k features that have the strongest influence on PC1 and PC2.

This function analyzes the loading scores (coefficients) of the first two principal components to determine which original features contribute most strongly to these components. The absolute values of the loading scores are used to rank feature importance.

Parameters

Name	Type	Description	Default
pca	`sklearn`.`decomposition`.`PCA`	Fitted PCA object containing the components_ attribute with the principal components.	required
feature_names	list - `like`	Names of the original features, must match the order of features used in PCA fitting.	required
k	int	Number of top features to select for each principal component. Defaults to 10.	`10`

Returns

Name	Type	Description
tuple	tuple	A tuple containing two lists: - list[str]: Names of the k features most influential on PC1 - list[str]: Names of the k features most influential on PC2

Examples

>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> from sklearn.datasets import load_iris
>>> from spotpython.utils.pca import get_pca_topk
>>>
>>> # Load and prepare the iris dataset
>>> iris = load_iris()
>>> X = iris.data
>>> feature_names = iris.feature_names
>>>
>>> # Fit PCA
>>> pca = PCA()
>>> pca.fit(X)
>>>
>>> # Get top 2 most influential features for PC1 and PC2
>>> top_pc1, top_pc2 = get_pca_topk(pca,
...                                 feature_names=feature_names,
...                                 k=2)
>>> print("Top PC1 features:", top_pc1)
>>> print("Top PC2 features:", top_pc2)

Note

The function assumes that PCA has been fitted on standardized data
The length of feature_names must match the number of features in the PCA input
k should not exceed the total number of features