vbdp

`affinity_propagation_features(X)` ¶

Clusters the features of a dataframe using Affinity Propagation.

This function takes a dataframe with features and clusters them using the Affinity Propagation algorithm. The resulting dataframe contains the original features as well as a new feature representing the cluster labels.

Parameters:

Name	Type	Description	Default
`X`	`DataFrame`	A dataframe with features.	required

Returns:

Type	Description
`DataFrame`	A dataframe with the original features and a new cluster feature.

Examples:

>>> df = pd.DataFrame({"a": [True, False, True], "b": [True, True, False], "c": [False, False, True]})
>>> df
    a      b      c
0  True   True   False
1  False  True   False
2  True   False  True
>>> affinity_propagation_features(df)
Estimated number of clusters: 3
    a      b      c  cluster
0  True   True   False       0
1  False  True   False       1
2  True   False  True        2

Source code in spotPython/data/vbdp.py

def affinity_propagation_features(X: pd.DataFrame) -> pd.DataFrame:
    """Clusters the features of a dataframe using Affinity Propagation.

    This function takes a dataframe with features and clusters them using the
    Affinity Propagation algorithm. The resulting dataframe contains the original
    features as well as a new feature representing the cluster labels.

    Args:
        X (pd.DataFrame):
            A dataframe with features.

    Returns:
        (pd.DataFrame):
            A dataframe with the original features and a new cluster feature.

    Examples:
        >>> df = pd.DataFrame({"a": [True, False, True], "b": [True, True, False], "c": [False, False, True]})
        >>> df
            a      b      c
        0  True   True   False
        1  False  True   False
        2  True   False  True
        >>> affinity_propagation_features(df)
        Estimated number of clusters: 3
            a      b      c  cluster
        0  True   True   False       0
        1  False  True   False       1
        2  True   False  True        2
    """
    D = manhattan_distances(X)
    af = AffinityPropagation(random_state=0, affinity="precomputed").fit(D)
    cluster_centers_indices = af.cluster_centers_indices_
    n_clusters_ = len(cluster_centers_indices)
    print("Estimated number of clusters: %d" % n_clusters_)
    X["cluster"] = af.labels_
    return X

`cluster_features(X)` ¶

Clusters the features of a dataframe based on similarity.

This function takes a dataframe with features and clusters them based on similarity. The resulting dataframe contains the original features as well as new features representing the clusters.