factor_analyzer.confirmatory_factor_analyzer

factor_analyzer.confirmatory_factor_analyzer

Confirmatory factor analysis using machine learning methods.

Classes

Name Description
ConfirmatoryFactorAnalyzer Fit a confirmatory factor analysis model using maximum likelihood.
ModelSpecification Encapsulate the model specification for CFA.
ModelSpecificationParser Generate the model specification for CFA.

ConfirmatoryFactorAnalyzer

factor_analyzer.confirmatory_factor_analyzer.ConfirmatoryFactorAnalyzer(
    specification=None,
    n_obs=None,
    is_cov_matrix=False,
    bounds=None,
    max_iter=200,
    tol=None,
    impute='median',
    disp=True,
)

Fit a confirmatory factor analysis model using maximum likelihood.

Parameters

Name Type Description Default
specification ModelSpecification A model specification. This must be a :class:ModelSpecification object or None. If None, a :class:ModelSpecification object will be generated assuming that n_factors == n_variables, and that all variables load on all factors. Note that this could mean the factor model is not identified, and the optimization could fail. Defaults to None. None
n_obs int The number of observations in the original data set. If this is not passed and is_cov_matrix is True, then an error will be raised. Defaults to None. None
is_cov_matrix bool Whether the input X is a covariance matrix. If False, assume it is the full data set. Defaults to False. False
bounds list of tuples A list of minimum and maximum boundaries for each element of the input array. This must equal x0, which is the input array from your parsed and combined model specification. The length is: ((n_factors * n_variables) + n_variables + n_factors + (((n_factors * n_factors) - n_factors) // 2). If None, nothing will be bounded. Defaults to None. None
max_iter int The maximum number of iterations for the optimization routine. Defaults to 200. 200
tol float The tolerance for convergence. Defaults to None. None
disp bool Whether to print the scipy optimization fmin message to standard output. Defaults to True. True

Raises

Name Type Description
ValueError If is_cov_matrix is True, and n_obs is not provided.

Attributes

Name Type Description
model ModelSpecification The model specification object.
loadings_ numpy.ndarray The factor loadings matrix. None, if fit() has not been called.
error_vars_ numpy.ndarray The error variance matrix.
factor_varcovs_ numpy.ndarray The factor covariance matrix.
log_likelihood_ float The log likelihood from the optimization routine.
aic_ float The Akaike information criterion.
bic_ float The Bayesian information criterion.

Examples

import numpy as np
import pandas as pd
import os
from spotoptim.factor_analyzer import (ConfirmatoryFactorAnalyzer,
                                      ModelSpecificationParser)
from spotoptim.utils import get_internal_datasets_folder
X = pd.read_csv(os.path.join(get_internal_datasets_folder(), 'test11.csv'))
model_dict = {"F1": ["V1", "V2", "V3", "V4"],
              "F2": ["V5", "V6", "V7", "V8"]}
model_spec = ModelSpecificationParser.parse_model_specification_from_dict(X, model_dict)
cfa = ConfirmatoryFactorAnalyzer(model_spec, disp=False)
cfa = cfa.fit(X.values)
print(np.round(cfa.loadings_, 2))
# array([[0.99, 0.  ],
#        [0.46, 0.  ],
#        [0.35, 0.  ],
#        [0.58, 0.  ],
#        [0.  , 0.99],
#        [0.  , 0.73],
#        [0.  , 0.38],
#        [0.  , 0.5 ]])

print(np.round(cfa.factor_varcovs_, 2))
# array([[1.  , 0.17],
#        [0.17, 1.  ]])

loadings_se, variances_se = cfa.get_standard_errors()
print(np.round(loadings_se, 2))
# array([[0.07, 0.  ],
#        [0.04, 0.  ],
#        [0.04, 0.  ],
#        [0.05, 0.  ],
#        [0.  , 0.06],
#        [0.  , 0.05],
#        [0.  , 0.04],
#        [0.  , 0.04]])

print(np.round(variances_se, 2))
# array([0.12, 0.05, 0.05, 0.06, 0.1 , 0.07, 0.05, 0.05])

print(np.round(cfa.transform(X.values), 2))
# array([[-0.47, -1.09],
#        [ 2.59,  1.2 ],
#        [-0.47,  2.66],
#        ...,
#        [-1.59, -0.92],
#        [ 0.19,  0.88],
#        [-0.28, -0.77]])
[[0.99 0.  ]
 [0.46 0.  ]
 [0.35 0.  ]
 [0.58 0.  ]
 [0.   0.99]
 [0.   0.73]
 [0.   0.38]
 [0.   0.5 ]]
[[1.   0.17]
 [0.17 1.  ]]
[[0.07 0.  ]
 [0.04 0.  ]
 [0.04 0.  ]
 [0.05 0.  ]
 [0.   0.06]
 [0.   0.05]
 [0.   0.04]
 [0.   0.04]]
[0.12 0.05 0.05 0.06 0.1  0.07 0.05 0.05]
[[-0.47 -1.09]
 [ 2.59  1.2 ]
 [-0.47  2.66]
 ...
 [-1.59 -0.92]
 [ 0.19  0.88]
 [-0.28 -0.77]]

Methods

Name Description
fit Perform confirmatory factor analysis.
get_model_implied_cov Get the model-implied covariance matrix (sigma) for an estimated model.
get_standard_errors Get standard errors from the implied covariance matrix and implied means.
transform Get the factor scores for a new data set.
fit
factor_analyzer.confirmatory_factor_analyzer.ConfirmatoryFactorAnalyzer.fit(
    X,
    y=None,
)

Perform confirmatory factor analysis.

Parameters
Name Type Description Default
X array - like The data to use for confirmatory factor analysis. If this is just a covariance matrix, make sure is_cov_matrix was set to True. required
y ignored Ignored. None
Returns
Name Type Description
self ConfirmatoryFactorAnalyzer The fitted confirmatory factor analyzer object.
Raises
Name Type Description
ValueError If the specification is not None or a :class:ModelSpecification object.
AssertionError If is_cov_matrix was True and the matrix is not square.
AssertionError If len(bounds) != len(x0)
Examples
import numpy as np
import pandas as pd
import os
from spotoptim.factor_analyzer import (ConfirmatoryFactorAnalyzer,
                                      ModelSpecificationParser)
from spotoptim.utils import get_internal_datasets_folder
X = pd.read_csv(os.path.join(get_internal_datasets_folder(), 'test11.csv'))
model_dict = {"F1": ["V1", "V2", "V3", "V4"],
              "F2": ["V5", "V6", "V7", "V8"]}
model_spec = ModelSpecificationParser.parse_model_specification_from_dict(X, model_dict)
cfa = ConfirmatoryFactorAnalyzer(model_spec, disp=False)
cfa = cfa.fit(X.values)
print(np.round(cfa.loadings_, 2))
# array([[0.99, 0.  ],
#        [0.46, 0.  ],
#        [0.35, 0.  ],
#        [0.58, 0.  ],
#        [0.  , 0.99],
#        [0.  , 0.73],
#        [0.  , 0.38],
#        [0.  , 0.5 ]])
[[0.99 0.  ]
 [0.46 0.  ]
 [0.35 0.  ]
 [0.58 0.  ]
 [0.   0.99]
 [0.   0.73]
 [0.   0.38]
 [0.   0.5 ]]
get_model_implied_cov
factor_analyzer.confirmatory_factor_analyzer.ConfirmatoryFactorAnalyzer.get_model_implied_cov(
)

Get the model-implied covariance matrix (sigma) for an estimated model.

Returns
Name Type Description
model_implied_cov numpy.ndarray The model-implied covariance matrix.
Examples
import numpy as np
import pandas as pd
import os
from spotoptim.factor_analyzer import (ConfirmatoryFactorAnalyzer,
                                      ModelSpecificationParser)
from spotoptim.utils import get_internal_datasets_folder
X = pd.read_csv(os.path.join(get_internal_datasets_folder(), 'test11.csv'))
model_dict = {"F1": ["V1", "V2", "V3", "V4"],
              "F2": ["V5", "V6", "V7", "V8"]}
model_spec = ModelSpecificationParser.parse_model_specification_from_dict(X, model_dict)
cfa = ConfirmatoryFactorAnalyzer(model_spec, disp=False)
cfa = cfa.fit(X.values)
print(np.round(cfa.get_model_implied_cov(), 2))
# array([[2.08, 0.46, 0.35, 0.58, 0.17, 0.13, 0.06, 0.09],
#        [0.46, 1.17, 0.16, 0.27, 0.08, 0.06, 0.03, 0.04],
#        [0.35, 0.16, 1.07, 0.2 , 0.06, 0.04, 0.02, 0.03],
#        [0.58, 0.27, 0.2 , 1.29, 0.1 , 0.07, 0.04, 0.05],
#        [0.17, 0.08, 0.06, 0.1 , 2.04, 0.72, 0.37, 0.49],
#        [0.13, 0.06, 0.04, 0.07, 0.72, 1.48, 0.28, 0.37],
#        [0.06, 0.03, 0.02, 0.04, 0.37, 0.28, 1.12, 0.19],
#        [0.09, 0.04, 0.03, 0.05, 0.49, 0.37, 0.19, 1.29]])
[[2.08 0.46 0.35 0.58 0.17 0.13 0.06 0.09]
 [0.46 1.17 0.16 0.27 0.08 0.06 0.03 0.04]
 [0.35 0.16 1.07 0.2  0.06 0.04 0.02 0.03]
 [0.58 0.27 0.2  1.29 0.1  0.07 0.04 0.05]
 [0.17 0.08 0.06 0.1  2.04 0.72 0.37 0.49]
 [0.13 0.06 0.04 0.07 0.72 1.48 0.28 0.37]
 [0.06 0.03 0.02 0.04 0.37 0.28 1.12 0.19]
 [0.09 0.04 0.03 0.05 0.49 0.37 0.19 1.29]]
get_standard_errors
factor_analyzer.confirmatory_factor_analyzer.ConfirmatoryFactorAnalyzer.get_standard_errors(
)

Get standard errors from the implied covariance matrix and implied means.

Returns
Name Type Description
tuple Tuple[np.ndarray, np.ndarray] - loadings_se (numpy.ndarray): The standard errors for the factor loadings. - error_vars_se (numpy.ndarray): The standard errors for the error variances.
Examples
import numpy as np
import pandas as pd
import os
from spotoptim.factor_analyzer import (ConfirmatoryFactorAnalyzer,
                                      ModelSpecificationParser)
from spotoptim.utils import get_internal_datasets_folder
X = pd.read_csv(os.path.join(get_internal_datasets_folder(), 'test11.csv'))
model_dict = {"F1": ["V1", "V2", "V3", "V4"],
              "F2": ["V5", "V6", "V7", "V8"]}
model_spec = ModelSpecificationParser.parse_model_specification_from_dict(X, model_dict)
cfa = ConfirmatoryFactorAnalyzer(model_spec, disp=False)
cfa = cfa.fit(X.values)
loadings_se, variances_se = cfa.get_standard_errors()
print(np.round(loadings_se, 2))
# array([[0.07, 0.  ],
#        [0.04, 0.  ],
#        [0.04, 0.  ],
#        [0.05, 0.  ],
#        [0.  , 0.06],
#        [0.  , 0.05],
#        [0.  , 0.04],
#        [0.  , 0.04]])

print(np.round(variances_se, 2))
# array([0.12, 0.05, 0.05, 0.06, 0.1 , 0.07, 0.05, 0.05])
[[0.07 0.  ]
 [0.04 0.  ]
 [0.04 0.  ]
 [0.05 0.  ]
 [0.   0.06]
 [0.   0.05]
 [0.   0.04]
 [0.   0.04]]
[0.12 0.05 0.05 0.06 0.1  0.07 0.05 0.05]
transform
factor_analyzer.confirmatory_factor_analyzer.ConfirmatoryFactorAnalyzer.transform(
    X,
)

Get the factor scores for a new data set.

Parameters
Name Type Description Default
X array - like The data to score using the fitted factor model, shape (n_samples, n_features). required
Returns
Name Type Description
scores numpy.ndarray The latent variables of X, shape (n_samples, n_components).
Examples
import numpy as np
import pandas as pd
import os
from spotoptim.factor_analyzer import (ConfirmatoryFactorAnalyzer,
                                      ModelSpecificationParser)
from spotoptim.utils import get_internal_datasets_folder
X = pd.read_csv(os.path.join(get_internal_datasets_folder(), 'test11.csv'))
model_dict = {"F1": ["V1", "V2", "V3", "V4"],
              "F2": ["V5", "V6", "V7", "V8"]}
model_spec = ModelSpecificationParser.parse_model_specification_from_dict(X, model_dict)
cfa = ConfirmatoryFactorAnalyzer(model_spec, disp=False)
cfa = cfa.fit(X.values)
print(np.round(cfa.transform(X.values), 2))
# array([[-0.47, -1.09],
#        [ 2.59,  1.2 ],
#        [-0.47,  2.66],
#        ...,
#        [-1.59, -0.92],
#        [ 0.19,  0.88],
#        [-0.28, -0.77]])
[[-0.47 -1.09]
 [ 2.59  1.2 ]
 [-0.47  2.66]
 ...
 [-1.59 -0.92]
 [ 0.19  0.88]
 [-0.28 -0.77]]
References

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6157408/

ModelSpecification

factor_analyzer.confirmatory_factor_analyzer.ModelSpecification(
    loadings,
    n_factors,
    n_variables,
    factor_names=None,
    variable_names=None,
)

Encapsulate the model specification for CFA.

This class contains a number of specification properties that are used in the CFA procedure.

Parameters

Name Type Description Default
loadings array - like The factor loadings specification. required
n_factors int The number of factors. required
n_variables int The number of variables. required
factor_names list of str A list of factor names, if available. Defaults to None. None
variable_names list of str A list of variable names, if available. Defaults to None. None

Attributes

Name Description
error_vars Get the error variance specification.
error_vars_free Get the indices of “free” error variance parameters.
factor_covs Get the factor covariance specification.
factor_covs_free Get the indices of “free” factor covariance parameters.
factor_names Get list of factor names, if available.
loadings Get the factor loadings specification.
loadings_free Get the indices of “free” factor loading parameters.
n_factors Get the number of factors.
n_lower_diag Get the lower diagonal of the factor covariance matrix.
n_variables Get the number of variables.
variable_names Get list of variable names, if available.

Methods

Name Description
copy Return a copy of the model specification.
get_model_specification_as_dict Get the model specification as a dictionary.
copy
factor_analyzer.confirmatory_factor_analyzer.ModelSpecification.copy()

Return a copy of the model specification.

get_model_specification_as_dict
factor_analyzer.confirmatory_factor_analyzer.ModelSpecification.get_model_specification_as_dict(
)

Get the model specification as a dictionary.

Returns
Name Type Description
model_specification dict The model specification keys and values, as a dictionary.

ModelSpecificationParser

factor_analyzer.confirmatory_factor_analyzer.ModelSpecificationParser()

Generate the model specification for CFA.

This class includes two static methods to generate a :class:ModelSpecification object from either a dictionary or a numpy array.

Methods

Name Description
parse_model_specification_from_array Generate the model specification from a numpy array.
parse_model_specification_from_dict Generate the model specification from a dictionary.
parse_model_specification_from_array
factor_analyzer.confirmatory_factor_analyzer.ModelSpecificationParser.parse_model_specification_from_array(
    X,
    specification=None,
)

Generate the model specification from a numpy array.

The columns should correspond to the factors, and the rows should correspond to the variables. If this method is used to create the :class:ModelSpecification object, then no factor names and variable names will be added as properties to that object.

Parameters
Name Type Description Default
X array - like The data set that will be used for CFA. required
specification array - like An array with the loading details. If None, the matrix will be created assuming all variables load on all factors. Defaults to None. None
Returns
Name Type Description
ModelSpecification ModelSpecification A model specification object.
Raises
Name Type Description
ValueError If specification is not in the expected format.
Examples
import pandas as pd
import numpy as np
import os
from spotoptim.factor_analyzer import (ConfirmatoryFactorAnalyzer,
                                      ModelSpecificationParser)
from spotoptim.utils import get_internal_datasets_folder
X = pd.read_csv(os.path.join(get_internal_datasets_folder(), 'test11.csv'))
model_array = np.array([[1, 1, 1, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 1, 1, 1]])
model_spec = ModelSpecificationParser.parse_model_specification_from_array(X,
                                                                             model_array)
parse_model_specification_from_dict
factor_analyzer.confirmatory_factor_analyzer.ModelSpecificationParser.parse_model_specification_from_dict(
    X,
    specification=None,
)

Generate the model specification from a dictionary.

The keys in the dictionary should be the factor names, and the values should be the feature names. If this method is used to create the :class:ModelSpecification object, then factor names and variable names will be added as properties to that object.

Parameters
Name Type Description Default
X array - like The data set that will be used for CFA. required
specification dict A dictionary with the loading details. If None, the matrix will be created assuming all variables load on all factors. Defaults to None. None
Returns
Name Type Description
ModelSpecification ModelSpecification A model specification object.
Raises
Name Type Description
ValueError If specification is not in the expected format.
Examples
import pandas as pd
import os
from spotoptim.factor_analyzer import (ConfirmatoryFactorAnalyzer,
                                      ModelSpecificationParser)
from spotoptim.utils import get_internal_datasets_folder
X = pd.read_csv(os.path.join(get_internal_datasets_folder(), 'test11.csv'))
model_dict = {"F1": ["V1", "V2", "V3", "V4"],
              "F2": ["V5", "V6", "V7", "V8"]}
model_spec = ModelSpecificationParser.parse_model_specification_from_dict(X, model_dict)