factor_analyzer.factor_analyzer.FactorAnalyzer

factor_analyzer.factor_analyzer.FactorAnalyzer(
    n_factors=3,
    rotation='promax',
    method='minres',
    use_smc=True,
    is_corr_matrix=False,
    bounds=(0.005, 1),
    impute='median',
    svd_method='randomized',
    rotation_kwargs=None,
)

The main exploratory factor analysis class.

This class

  1. Fits a factor analysis model using minres, maximum likelihood, or principal factor extraction and returns the loading matrix

  2. Optionally performs a rotation, with method including:

    1. varimax (orthogonal rotation)
    2. promax (oblique rotation)
    3. oblimin (oblique rotation)
    4. oblimax (orthogonal rotation)
    5. quartimin (oblique rotation)
    6. quartimax (orthogonal rotation)
    7. equamax (orthogonal rotation)

Parameters

Name Type Description Default
n_factors int The number of factors to select. Defaults to 3. 3
rotation str The type of rotation to perform after fitting the factor analysis model. If set to None, no rotation will be performed, nor will any associated Kaiser normalization. Possible values include: (a) varimax (orthogonal rotation) (b) promax (oblique rotation) (c) oblimin (oblique rotation) (d) oblimax (orthogonal rotation) (e) quartimin (oblique rotation) (f) quartimax (orthogonal rotation) (g) equamax (orthogonal rotation) Defaults to ‘promax’. 'promax'
method str The fitting method to use, either ‘minres’, ‘ml’, or ‘principal’. Defaults to ‘minres’. 'minres'
use_smc bool Whether to use squared multiple correlation as starting guesses for factor analysis. Defaults to True. True
bounds tuple The lower and upper bounds on the variables for “L-BFGS-B” optimization. Defaults to (0.005, 1). (0.005, 1)
impute str How to handle missing values, if any, in the data: (a) use list-wise deletion (‘drop’), or (b) impute the column median (‘median’), or impute the column mean (‘mean’). Defaults to ‘median’. 'median'
is_corr_matrix bool Set to True if the data is the correlation matrix. Defaults to False. False
svd_method str The SVD method to use when method is ‘principal’. If ‘lapack’, use standard SVD from scipy.linalg. If ‘randomized’, use faster randomized_svd function from scikit-learn. Defaults to ‘randomized’. 'randomized'
rotation_kwargs dict Dictionary containing keyword arguments for the rotation method. Defaults to None. None

Attributes

Name Type Description
loadings_ numpy.ndarray The factor loadings matrix. None, if fit() has not been called.
corr_ numpy.ndarray The original correlation matrix. None, if fit() has not been called.
rotation_matrix_ numpy.ndarray The rotation matrix, if a rotation has been performed. None otherwise.
structure_ numpy.ndarray or None The structure loading matrix. This only exists if rotation is ‘promax’.
phi_ numpy.ndarray or None The factor correlations matrix. This only exists if rotation is ‘oblique’.

Notes

This code was partly derived from the excellent R package psych.

References

[1] https://github.com/cran/psych/blob/master/R/fa.R

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> fa = fa.fit(df_features)
>>> np.round(fa.loadings_, 2)
array([[-0.13,  0.16,  0.74],
       [ 0.04,  0.05,  0.01],
       [ 0.35,  0.61, -0.07],
       [ 0.45,  0.72, -0.08],
       [ 0.37,  0.44, -0.02],
       [ 0.74, -0.15,  0.3 ],
       [ 0.74, -0.16, -0.21],
       [ 0.83, -0.21,  0.05],
       [ 0.76, -0.24, -0.12],
       [ 0.82, -0.12,  0.18]])
>>> np.round(fa.get_communalities(), 2)
array([0.59, 0.  , 0.5 , 0.73, 0.33, 0.66, 0.62, 0.73, 0.65, 0.71])

Methods

Name Description
fit Fit factor analysis model using either MINRES, ML, or principal factor analysis.
get_communalities Calculate the communalities, given the factor loading matrix.
get_eigenvalues Calculate the eigenvalues, given the factor correlation matrix.
get_factor_variance Calculate factor variance information.
get_uniquenesses Calculate the uniquenesses, given the factor loading matrix.
sufficiency Perform the sufficiency test.
transform Get factor scores for a new data set.

fit

factor_analyzer.factor_analyzer.FactorAnalyzer.fit(X, y=None)

Fit factor analysis model using either MINRES, ML, or principal factor analysis.

By default, use SMC as starting guesses.

Parameters

Name Type Description Default
X array - like The data to analyze. required
y ignored Ignored. None

Returns

Name Type Description
self FactorAnalyzer The fitted factor analyzer object.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.loadings_, 2)
array([[-0.13,  0.16,  0.74],
       [ 0.04,  0.05,  0.01],
       [ 0.35,  0.61, -0.07],
       [ 0.45,  0.72, -0.08],
       [ 0.37,  0.44, -0.02],
       [ 0.74, -0.15,  0.3 ],
       [ 0.74, -0.16, -0.21],
       [ 0.83, -0.21,  0.05],
       [ 0.76, -0.24, -0.12],
       [ 0.82, -0.12,  0.18]])

get_communalities

factor_analyzer.factor_analyzer.FactorAnalyzer.get_communalities()

Calculate the communalities, given the factor loading matrix.

Returns

Name Type Description
communalities numpy.ndarray The communalities from the factor loading matrix.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.get_communalities(), 2)
array([0.59, 0.  , 0.5 , 0.73, 0.33, 0.66, 0.62, 0.73, 0.65, 0.71])

get_eigenvalues

factor_analyzer.factor_analyzer.FactorAnalyzer.get_eigenvalues()

Calculate the eigenvalues, given the factor correlation matrix.

Returns

Name Type Description
original_eigen_values numpy.ndarray The original eigenvalues.
common_factor_eigen_values numpy.ndarray The common factor eigenvalues.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> ev, v = fa.get_eigenvalues()
>>> np.round(ev, 1)
array([...])
>>> np.round(v, 1)
array([...])

get_factor_variance

factor_analyzer.factor_analyzer.FactorAnalyzer.get_factor_variance()

Calculate factor variance information.

The factor variance information including the variance, proportional variance, and cumulative variance for each factor.

Returns

Name Type Description
variance numpy.ndarray The factor variances.
proportional_variance numpy.ndarray The proportional factor variances.
cumulative_variances numpy.ndarray The cumulative factor variances.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> var, prop_var, cum_var = fa.get_factor_variance()
>>> np.round(var, 2)
array([3.51, 1.28, 0.74])
>>> np.round(prop_var, 2)
array([0.35, 0.13, 0.07])
>>> np.round(cum_var, 2)
array([0.35, 0.48, 0.55])

get_uniquenesses

factor_analyzer.factor_analyzer.FactorAnalyzer.get_uniquenesses()

Calculate the uniquenesses, given the factor loading matrix.

Returns

Name Type Description
uniquenesses numpy.ndarray The uniquenesses from the factor loading matrix.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.get_uniquenesses(), 2)
array([0.41, 1.  , 0.5 , 0.27, 0.67, 0.34, 0.38, 0.27, 0.35, 0.29])

sufficiency

factor_analyzer.factor_analyzer.FactorAnalyzer.sufficiency(num_observations)

Perform the sufficiency test.

The test calculates statistics under the null hypothesis that the selected number of factors is sufficient.

Parameters

Name Type Description Default
num_observations int The number of observations in the input data that this factor analyzer was fit using. required

Returns

Name Type Description
statistic float The test statistic.
degrees int The degrees of freedom.
pvalue float The p-value of the test.

References

[1] Lawley, D. N. and Maxwell, A. E. (1971). Factor Analysis as a Statistical Method. Second edition. Butterworths. P. 36.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test01.csv')
>>> fa = FactorAnalyzer(n_factors=3, rotation=None, method="ml")
>>> _ = fa.fit(df_features)
>>> stat, df, p = fa.sufficiency(df_features.shape[0])
>>> float(np.round(stat, 2))
1475.88
>>> df
663
>>> bool(p < 0.05)
True

transform

factor_analyzer.factor_analyzer.FactorAnalyzer.transform(X)

Get factor scores for a new data set.

Parameters

Name Type Description Default
X array - like The data to score using the fitted factor model. Shape should be (n_samples, n_features). required

Returns

Name Type Description
scores numpy.ndarray The latent variables of X. Shape is (n_samples, n_components).

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.transform(df_features), 2)
array([[-1.05,  0.58,  0.17],
       [-1.6 ,  0.9 ,  0.04],
       [-1.22, -1.16,  0.57],
       ...,
       [ 0.14,  0.04,  0.29],
       [ 1.87, -0.35, -0.68],
       [ 0.86,  0.18, -0.79]], shape=(1678, 3))