utils.pca.get_pca

utils.pca.get_pca(df, n_components=3)

Scale the numeric data and perform PCA.

Parameters

Name Type Description Default
df pd.DataFrame Input DataFrame. required
n_components int Number of principal components to compute. Defaults to 3. 3

Returns

Name Type Description
tuple tuple - pca (PCA): Fitted PCA object. - scaled_data (np.ndarray): Scaled numeric data. - feature_names (pd.Index): Names of the numeric features. - sample_names (pd.Index): Index of the samples. - pca_data (np.ndarray): PCA-transformed data.

Examples

>>> import pandas as pd
>>> from spotpython.utils.pca import get_pca
>>> df = pd.DataFrame({
...     "A": [1, 2, 3],
...     "B": [4, 5, 6],
...     "C": ["x", "y", "z"]  # Non-numeric column will be ignored
... })
>>> pca, scaled_data, feature_names, sample_names, pca_data = get_pca(df)
>>> print(feature_names)
Index(['A', 'B'], dtype='object')
>>> print(pca_data.shape)
(3, 2)