factor_analyzer.factor_analyzer

factor_analyzer.factor_analyzer

Factor analysis using MINRES or ML, with optional rotation using Varimax or Promax.

Classes

Name	Description
FactorAnalyzer	The main exploratory factor analysis class.

FactorAnalyzer

factor_analyzer.factor_analyzer.FactorAnalyzer(
    n_factors=3,
    rotation='promax',
    method='minres',
    use_smc=True,
    is_corr_matrix=False,
    bounds=(0.005, 1),
    impute='median',
    svd_method='randomized',
    rotation_kwargs=None,
)

The main exploratory factor analysis class.

This class

Fits a factor analysis model using minres, maximum likelihood, or principal factor extraction and returns the loading matrix
Optionally performs a rotation, with method including:
1. varimax (orthogonal rotation)
2. promax (oblique rotation)
3. oblimin (oblique rotation)
4. oblimax (orthogonal rotation)
5. quartimin (oblique rotation)
6. quartimax (orthogonal rotation)
7. equamax (orthogonal rotation)

Parameters

Name	Type	Description	Default
n_factors	int	The number of factors to select. Defaults to 3.	`3`
rotation	str	The type of rotation to perform after fitting the factor analysis model. If set to None, no rotation will be performed, nor will any associated Kaiser normalization. Possible values include: (a) varimax (orthogonal rotation) (b) promax (oblique rotation) (c) oblimin (oblique rotation) (d) oblimax (orthogonal rotation) (e) quartimin (oblique rotation) (f) quartimax (orthogonal rotation) (g) equamax (orthogonal rotation) Defaults to ‘promax’.	`'promax'`
method	str	The fitting method to use, either ‘minres’, ‘ml’, or ‘principal’. Defaults to ‘minres’.	`'minres'`
use_smc	bool	Whether to use squared multiple correlation as starting guesses for factor analysis. Defaults to True.	`True`
bounds	tuple	The lower and upper bounds on the variables for “L-BFGS-B” optimization. Defaults to (0.005, 1).	`(0.005, 1)`
impute	str	How to handle missing values, if any, in the data: (a) use list-wise deletion (‘drop’), or (b) impute the column median (‘median’), or impute the column mean (‘mean’). Defaults to ‘median’.	`'median'`
is_corr_matrix	bool	Set to True if the data is the correlation matrix. Defaults to False.	`False`
svd_method	str	The SVD method to use when method is ‘principal’. If ‘lapack’, use standard SVD from scipy.linalg. If ‘randomized’, use faster randomized_svd function from scikit-learn. Defaults to ‘randomized’.	`'randomized'`
rotation_kwargs	dict	Dictionary containing keyword arguments for the rotation method. Defaults to None.	`None`

Attributes

Name	Type	Description
loadings_	numpy.ndarray	The factor loadings matrix. None, if fit() has not been called.
corr_	numpy.ndarray	The original correlation matrix. None, if fit() has not been called.
rotation_matrix_	numpy.ndarray	The rotation matrix, if a rotation has been performed. None otherwise.
structure_	numpy.ndarray or None	The structure loading matrix. This only exists if rotation is ‘promax’.
phi_	numpy.ndarray or None	The factor correlations matrix. This only exists if rotation is ‘oblique’.

Notes

This code was partly derived from the excellent R package psych.

References

[1] https://github.com/cran/psych/blob/master/R/fa.R

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> fa = fa.fit(df_features)
>>> np.round(fa.loadings_, 2)
array([[-0.13,  0.16,  0.74],
       [ 0.04,  0.05,  0.01],
       [ 0.35,  0.61, -0.07],
       [ 0.45,  0.72, -0.08],
       [ 0.37,  0.44, -0.02],
       [ 0.74, -0.15,  0.3 ],
       [ 0.74, -0.16, -0.21],
       [ 0.83, -0.21,  0.05],
       [ 0.76, -0.24, -0.12],
       [ 0.82, -0.12,  0.18]])
>>> np.round(fa.get_communalities(), 2)
array([0.59, 0.  , 0.5 , 0.73, 0.33, 0.66, 0.62, 0.73, 0.65, 0.71])

Methods

Name	Description
fit	Fit factor analysis model using either MINRES, ML, or principal factor analysis.
get_communalities	Calculate the communalities, given the factor loading matrix.
get_eigenvalues	Calculate the eigenvalues, given the factor correlation matrix.
get_factor_variance	Calculate factor variance information.
get_uniquenesses	Calculate the uniquenesses, given the factor loading matrix.
sufficiency	Perform the sufficiency test.
transform	Get factor scores for a new data set.

fit

factor_analyzer.factor_analyzer.FactorAnalyzer.fit(X, y=None)

Fit factor analysis model using either MINRES, ML, or principal factor analysis.

By default, use SMC as starting guesses.

Parameters

Name	Type	Description	Default
X	array - `like`	The data to analyze.	required
y	`ignored`	Ignored.	`None`

Returns

Name	Type	Description
self	FactorAnalyzer	The fitted factor analyzer object.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.loadings_, 2)
array([[-0.13,  0.16,  0.74],
       [ 0.04,  0.05,  0.01],
       [ 0.35,  0.61, -0.07],
       [ 0.45,  0.72, -0.08],
       [ 0.37,  0.44, -0.02],
       [ 0.74, -0.15,  0.3 ],
       [ 0.74, -0.16, -0.21],
       [ 0.83, -0.21,  0.05],
       [ 0.76, -0.24, -0.12],
       [ 0.82, -0.12,  0.18]])

get_communalities

factor_analyzer.factor_analyzer.FactorAnalyzer.get_communalities()

Calculate the communalities, given the factor loading matrix.

Returns

Name	Type	Description
communalities	numpy.ndarray	The communalities from the factor loading matrix.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.get_communalities(), 2)
array([0.59, 0.  , 0.5 , 0.73, 0.33, 0.66, 0.62, 0.73, 0.65, 0.71])

get_eigenvalues

factor_analyzer.factor_analyzer.FactorAnalyzer.get_eigenvalues()

Calculate the eigenvalues, given the factor correlation matrix.

Returns

Name	Type	Description
original_eigen_values	numpy.ndarray	The original eigenvalues.
common_factor_eigen_values	numpy.ndarray	The common factor eigenvalues.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> ev, v = fa.get_eigenvalues()
>>> np.round(ev, 1)
array([...])
>>> np.round(v, 1)
array([...])

get_factor_variance

factor_analyzer.factor_analyzer.FactorAnalyzer.get_factor_variance()

Calculate factor variance information.

The factor variance information including the variance, proportional variance, and cumulative variance for each factor.

Returns

Name	Type	Description
variance	numpy.ndarray	The factor variances.
proportional_variance	numpy.ndarray	The proportional factor variances.
cumulative_variances	numpy.ndarray	The cumulative factor variances.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> var, prop_var, cum_var = fa.get_factor_variance()
>>> np.round(var, 2)
array([3.51, 1.28, 0.74])
>>> np.round(prop_var, 2)
array([0.35, 0.13, 0.07])
>>> np.round(cum_var, 2)
array([0.35, 0.48, 0.55])

get_uniquenesses

factor_analyzer.factor_analyzer.FactorAnalyzer.get_uniquenesses()

Calculate the uniquenesses, given the factor loading matrix.

Returns

Name	Type	Description
uniquenesses	numpy.ndarray	The uniquenesses from the factor loading matrix.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.get_uniquenesses(), 2)
array([0.41, 1.  , 0.5 , 0.27, 0.67, 0.34, 0.38, 0.27, 0.35, 0.29])

sufficiency

factor_analyzer.factor_analyzer.FactorAnalyzer.sufficiency(num_observations)

Perform the sufficiency test.

The test calculates statistics under the null hypothesis that the selected number of factors is sufficient.

Parameters

Name	Type	Description	Default
num_observations	int	The number of observations in the input data that this factor analyzer was fit using.	required

Returns

Name	Type	Description
statistic	float	The test statistic.
degrees	int	The degrees of freedom.
pvalue	float	The p-value of the test.

References

[1] Lawley, D. N. and Maxwell, A. E. (1971). Factor Analysis as a Statistical Method. Second edition. Butterworths. P. 36.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test01.csv')
>>> fa = FactorAnalyzer(n_factors=3, rotation=None, method="ml")
>>> _ = fa.fit(df_features)
>>> stat, df, p = fa.sufficiency(df_features.shape[0])
>>> float(np.round(stat, 2))
1475.88
>>> df
663
>>> bool(p < 0.05)
True

transform

factor_analyzer.factor_analyzer.FactorAnalyzer.transform(X)

Get factor scores for a new data set.

Parameters

Name	Type	Description	Default
X	array - `like`	The data to score using the fitted factor model. Shape should be (n_samples, n_features).	required

Returns

Name	Type	Description
scores	numpy.ndarray	The latent variables of X. Shape is (n_samples, n_components).

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.transform(df_features), 2)
array([[-1.05,  0.58,  0.17],
       [-1.6 ,  0.9 ,  0.04],
       [-1.22, -1.16,  0.57],
       ...,
       [ 0.14,  0.04,  0.29],
       [ 1.87, -0.35, -0.68],
       [ 0.86,  0.18, -0.79]], shape=(1678, 3))

Functions

Name	Description
calculate_bartlett_sphericity	Compute the Bartlett sphericity test.
calculate_kmo	Calculate the Kaiser-Meyer-Olkin criterion for items and overall.

calculate_bartlett_sphericity

factor_analyzer.factor_analyzer.calculate_bartlett_sphericity(x)

Compute the Bartlett sphericity test.

H0: The matrix of population correlations is equal to I. H1: The matrix of population correlations is not equal to I.

The formula for Bartlett’s Sphericity test is:

.. math:: -1 * (n - 1 - ((2p + 5) / 6)) * ln(det(R))

Where R det(R) is the determinant of the correlation matrix, and p is the number of variables.

Parameters

Name	Type	Description	Default
x	array - `like`	The array for which to calculate sphericity.	required

Returns

Name	Type	Description
statistic	float	The chi-square value.
p_value	float	The associated p-value for the test.

calculate_kmo

factor_analyzer.factor_analyzer.calculate_kmo(x)

Calculate the Kaiser-Meyer-Olkin criterion for items and overall.

This statistic represents the degree to which each observed variable is predicted, without error, by the other variables in the dataset. In general, a KMO < 0.6 is considered inadequate.

Parameters

Name	Type	Description	Default
x	array - `like`	The array from which to calculate KMOs.	required

Returns

Name	Type	Description
kmo_per_variable	numpy.ndarray	The KMO score per item.
kmo_total	float	The overall KMO score.