factor_analyzer.factor_analyzer
factor_analyzer.factor_analyzer
Factor analysis using MINRES or ML, with optional rotation using Varimax or Promax.
Classes
| Name | Description |
|---|---|
| FactorAnalyzer | The main exploratory factor analysis class. |
FactorAnalyzer
factor_analyzer.factor_analyzer.FactorAnalyzer(
n_factors=3,
rotation='promax',
method='minres',
use_smc=True,
is_corr_matrix=False,
bounds=(0.005, 1),
impute='median',
svd_method='randomized',
rotation_kwargs=None,
)The main exploratory factor analysis class.
This class
Fits a factor analysis model using minres, maximum likelihood, or principal factor extraction and returns the loading matrix
Optionally performs a rotation, with method including:
- varimax (orthogonal rotation)
- promax (oblique rotation)
- oblimin (oblique rotation)
- oblimax (orthogonal rotation)
- quartimin (oblique rotation)
- quartimax (orthogonal rotation)
- equamax (orthogonal rotation)
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| n_factors | int | The number of factors to select. Defaults to 3. | 3 |
| rotation | str | The type of rotation to perform after fitting the factor analysis model. If set to None, no rotation will be performed, nor will any associated Kaiser normalization. Possible values include: (a) varimax (orthogonal rotation) (b) promax (oblique rotation) (c) oblimin (oblique rotation) (d) oblimax (orthogonal rotation) (e) quartimin (oblique rotation) (f) quartimax (orthogonal rotation) (g) equamax (orthogonal rotation) Defaults to ‘promax’. | 'promax' |
| method | str | The fitting method to use, either ‘minres’, ‘ml’, or ‘principal’. Defaults to ‘minres’. | 'minres' |
| use_smc | bool | Whether to use squared multiple correlation as starting guesses for factor analysis. Defaults to True. | True |
| bounds | tuple | The lower and upper bounds on the variables for “L-BFGS-B” optimization. Defaults to (0.005, 1). | (0.005, 1) |
| impute | str | How to handle missing values, if any, in the data: (a) use list-wise deletion (‘drop’), or (b) impute the column median (‘median’), or impute the column mean (‘mean’). Defaults to ‘median’. | 'median' |
| is_corr_matrix | bool | Set to True if the data is the correlation matrix. Defaults to False. | False |
| svd_method | str | The SVD method to use when method is ‘principal’. If ‘lapack’, use standard SVD from scipy.linalg. If ‘randomized’, use faster randomized_svd function from scikit-learn. Defaults to ‘randomized’. | 'randomized' |
| rotation_kwargs | dict | Dictionary containing keyword arguments for the rotation method. Defaults to None. | None |
Attributes
| Name | Type | Description |
|---|---|---|
| loadings_ | numpy.ndarray | The factor loadings matrix. None, if fit() has not been called. |
| corr_ | numpy.ndarray | The original correlation matrix. None, if fit() has not been called. |
| rotation_matrix_ | numpy.ndarray | The rotation matrix, if a rotation has been performed. None otherwise. |
| structure_ | numpy.ndarray or None | The structure loading matrix. This only exists if rotation is ‘promax’. |
| phi_ | numpy.ndarray or None | The factor correlations matrix. This only exists if rotation is ‘oblique’. |
Notes
This code was partly derived from the excellent R package psych.
References
[1] https://github.com/cran/psych/blob/master/R/fa.R
Examples
>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> fa = fa.fit(df_features)
>>> np.round(fa.loadings_, 2)
array([[-0.13, 0.16, 0.74],
[ 0.04, 0.05, 0.01],
[ 0.35, 0.61, -0.07],
[ 0.45, 0.72, -0.08],
[ 0.37, 0.44, -0.02],
[ 0.74, -0.15, 0.3 ],
[ 0.74, -0.16, -0.21],
[ 0.83, -0.21, 0.05],
[ 0.76, -0.24, -0.12],
[ 0.82, -0.12, 0.18]])
>>> np.round(fa.get_communalities(), 2)
array([0.59, 0. , 0.5 , 0.73, 0.33, 0.66, 0.62, 0.73, 0.65, 0.71])Methods
| Name | Description |
|---|---|
| fit | Fit factor analysis model using either MINRES, ML, or principal factor analysis. |
| get_communalities | Calculate the communalities, given the factor loading matrix. |
| get_eigenvalues | Calculate the eigenvalues, given the factor correlation matrix. |
| get_factor_variance | Calculate factor variance information. |
| get_uniquenesses | Calculate the uniquenesses, given the factor loading matrix. |
| sufficiency | Perform the sufficiency test. |
| transform | Get factor scores for a new data set. |
fit
factor_analyzer.factor_analyzer.FactorAnalyzer.fit(X, y=None)Fit factor analysis model using either MINRES, ML, or principal factor analysis.
By default, use SMC as starting guesses.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| X | array - like |
The data to analyze. | required |
| y | ignored |
Ignored. | None |
Returns
| Name | Type | Description |
|---|---|---|
| self | FactorAnalyzer | The fitted factor analyzer object. |
Examples
>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.loadings_, 2)
array([[-0.13, 0.16, 0.74],
[ 0.04, 0.05, 0.01],
[ 0.35, 0.61, -0.07],
[ 0.45, 0.72, -0.08],
[ 0.37, 0.44, -0.02],
[ 0.74, -0.15, 0.3 ],
[ 0.74, -0.16, -0.21],
[ 0.83, -0.21, 0.05],
[ 0.76, -0.24, -0.12],
[ 0.82, -0.12, 0.18]])get_communalities
factor_analyzer.factor_analyzer.FactorAnalyzer.get_communalities()Calculate the communalities, given the factor loading matrix.
Returns
| Name | Type | Description |
|---|---|---|
| communalities | numpy.ndarray | The communalities from the factor loading matrix. |
Examples
>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.get_communalities(), 2)
array([0.59, 0. , 0.5 , 0.73, 0.33, 0.66, 0.62, 0.73, 0.65, 0.71])get_eigenvalues
factor_analyzer.factor_analyzer.FactorAnalyzer.get_eigenvalues()Calculate the eigenvalues, given the factor correlation matrix.
Returns
| Name | Type | Description |
|---|---|---|
| original_eigen_values | numpy.ndarray | The original eigenvalues. |
| common_factor_eigen_values | numpy.ndarray | The common factor eigenvalues. |
Examples
>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> ev, v = fa.get_eigenvalues()
>>> np.round(ev, 1)
array([...])
>>> np.round(v, 1)
array([...])get_factor_variance
factor_analyzer.factor_analyzer.FactorAnalyzer.get_factor_variance()Calculate factor variance information.
The factor variance information including the variance, proportional variance, and cumulative variance for each factor.
Returns
| Name | Type | Description |
|---|---|---|
| variance | numpy.ndarray | The factor variances. |
| proportional_variance | numpy.ndarray | The proportional factor variances. |
| cumulative_variances | numpy.ndarray | The cumulative factor variances. |
Examples
>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> var, prop_var, cum_var = fa.get_factor_variance()
>>> np.round(var, 2)
array([3.51, 1.28, 0.74])
>>> np.round(prop_var, 2)
array([0.35, 0.13, 0.07])
>>> np.round(cum_var, 2)
array([0.35, 0.48, 0.55])get_uniquenesses
factor_analyzer.factor_analyzer.FactorAnalyzer.get_uniquenesses()Calculate the uniquenesses, given the factor loading matrix.
Returns
| Name | Type | Description |
|---|---|---|
| uniquenesses | numpy.ndarray | The uniquenesses from the factor loading matrix. |
Examples
>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.get_uniquenesses(), 2)
array([0.41, 1. , 0.5 , 0.27, 0.67, 0.34, 0.38, 0.27, 0.35, 0.29])sufficiency
factor_analyzer.factor_analyzer.FactorAnalyzer.sufficiency(num_observations)Perform the sufficiency test.
The test calculates statistics under the null hypothesis that the selected number of factors is sufficient.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| num_observations | int | The number of observations in the input data that this factor analyzer was fit using. | required |
Returns
| Name | Type | Description |
|---|---|---|
| statistic | float | The test statistic. |
| degrees | int | The degrees of freedom. |
| pvalue | float | The p-value of the test. |
References
[1] Lawley, D. N. and Maxwell, A. E. (1971). Factor Analysis as a Statistical Method. Second edition. Butterworths. P. 36.
Examples
>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test01.csv')
>>> fa = FactorAnalyzer(n_factors=3, rotation=None, method="ml")
>>> _ = fa.fit(df_features)
>>> stat, df, p = fa.sufficiency(df_features.shape[0])
>>> float(np.round(stat, 2))
1475.88
>>> df
663
>>> bool(p < 0.05)
Truetransform
factor_analyzer.factor_analyzer.FactorAnalyzer.transform(X)Get factor scores for a new data set.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| X | array - like |
The data to score using the fitted factor model. Shape should be (n_samples, n_features). | required |
Returns
| Name | Type | Description |
|---|---|---|
| scores | numpy.ndarray | The latent variables of X. Shape is (n_samples, n_components). |
Examples
>>> import numpy as np
>>> import pandas as pd
>>> from spotoptim.factor_analyzer import FactorAnalyzer
>>> df_features = pd.read_csv('src/spotoptim/datasets/test02.csv')
>>> fa = FactorAnalyzer(rotation=None)
>>> _ = fa.fit(df_features)
>>> np.round(fa.transform(df_features), 2)
array([[-1.05, 0.58, 0.17],
[-1.6 , 0.9 , 0.04],
[-1.22, -1.16, 0.57],
...,
[ 0.14, 0.04, 0.29],
[ 1.87, -0.35, -0.68],
[ 0.86, 0.18, -0.79]], shape=(1678, 3))Functions
| Name | Description |
|---|---|
| calculate_bartlett_sphericity | Compute the Bartlett sphericity test. |
| calculate_kmo | Calculate the Kaiser-Meyer-Olkin criterion for items and overall. |
calculate_bartlett_sphericity
factor_analyzer.factor_analyzer.calculate_bartlett_sphericity(x)Compute the Bartlett sphericity test.
H0: The matrix of population correlations is equal to I. H1: The matrix of population correlations is not equal to I.
The formula for Bartlett’s Sphericity test is:
.. math:: -1 * (n - 1 - ((2p + 5) / 6)) * ln(det(R))
Where R det(R) is the determinant of the correlation matrix, and p is the number of variables.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| x | array - like |
The array for which to calculate sphericity. | required |
Returns
| Name | Type | Description |
|---|---|---|
| statistic | float | The chi-square value. |
| p_value | float | The associated p-value for the test. |
calculate_kmo
factor_analyzer.factor_analyzer.calculate_kmo(x)Calculate the Kaiser-Meyer-Olkin criterion for items and overall.
This statistic represents the degree to which each observed variable is predicted, without error, by the other variables in the dataset. In general, a KMO < 0.6 is considered inadequate.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| x | array - like |
The array from which to calculate KMOs. | required |
Returns
| Name | Type | Description |
|---|---|---|
| kmo_per_variable | numpy.ndarray | The KMO score per item. |
| kmo_total | float | The overall KMO score. |