Skip to content

vbdp

affinity_propagation_features(X)

Clusters the features of a dataframe using Affinity Propagation.

This function takes a dataframe with features and clusters them using the Affinity Propagation algorithm. The resulting dataframe contains the original features as well as a new feature representing the cluster labels.

Parameters:

Name Type Description Default
X DataFrame

A dataframe with features.

required

Returns:

Type Description
DataFrame

A dataframe with the original features and a new cluster feature.

Examples:

>>> df = pd.DataFrame({"a": [True, False, True], "b": [True, True, False], "c": [False, False, True]})
>>> df
    a      b      c
0  True   True   False
1  False  True   False
2  True   False  True
>>> affinity_propagation_features(df)
Estimated number of clusters: 3
    a      b      c  cluster
0  True   True   False       0
1  False  True   False       1
2  True   False  True        2
Source code in spotpython/data/vbdp.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
def affinity_propagation_features(X: pd.DataFrame) -> pd.DataFrame:
    """Clusters the features of a dataframe using Affinity Propagation.

    This function takes a dataframe with features and clusters them using the
    Affinity Propagation algorithm. The resulting dataframe contains the original
    features as well as a new feature representing the cluster labels.

    Args:
        X (pd.DataFrame):
            A dataframe with features.

    Returns:
        (pd.DataFrame):
            A dataframe with the original features and a new cluster feature.

    Examples:
        >>> df = pd.DataFrame({"a": [True, False, True], "b": [True, True, False], "c": [False, False, True]})
        >>> df
            a      b      c
        0  True   True   False
        1  False  True   False
        2  True   False  True
        >>> affinity_propagation_features(df)
        Estimated number of clusters: 3
            a      b      c  cluster
        0  True   True   False       0
        1  False  True   False       1
        2  True   False  True        2
    """
    D = manhattan_distances(X)
    af = AffinityPropagation(random_state=0, affinity="precomputed").fit(D)
    cluster_centers_indices = af.cluster_centers_indices_
    n_clusters_ = len(cluster_centers_indices)
    print("Estimated number of clusters: %d" % n_clusters_)
    X["cluster"] = af.labels_
    return X

cluster_features(X)

Clusters the features of a dataframe based on similarity.

This function takes a dataframe with features and clusters them based on similarity. The resulting dataframe contains the original features as well as new features representing the clusters.

Parameters:

Name Type Description Default
X DataFrame

A dataframe with features.

required

Returns:

Type Description
DataFrame

A dataframe with the original features and new cluster features.

Examples:

>>> df = pd.DataFrame({"a": [True, False, True], "b": [True, True, False], "c": [False, False, True]})
>>> df
    a      b      c
0  True   True  False
1 False   True  False
2  True  False   True
>>> cluster_features(df)
    a      b      c  c_0  c_1  c_2  c_3
0  True   True  False    0    0    0    0
1 False   True  False    0    0    0    0
2  True  False   True    0    0    0    0
Source code in spotpython/data/vbdp.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def cluster_features(X: pd.DataFrame) -> pd.DataFrame:
    """Clusters the features of a dataframe based on similarity.

    This function takes a dataframe with features and clusters them based on similarity.
    The resulting dataframe contains the original features as well as new features representing the clusters.

    Args:
        X (pd.DataFrame): A dataframe with features.

    Returns:
        (pd.DataFrame): A dataframe with the original features and new cluster features.

    Examples:
        >>> df = pd.DataFrame({"a": [True, False, True], "b": [True, True, False], "c": [False, False, True]})
        >>> df
            a      b      c
        0  True   True  False
        1 False   True  False
        2  True  False   True
        >>> cluster_features(df)
            a      b      c  c_0  c_1  c_2  c_3
        0  True   True  False    0    0    0    0
        1 False   True  False    0    0    0    0
        2  True  False   True    0    0    0    0
    """
    c_0 = X.columns[X.columns.str.contains("pain")]
    c_1 = X.columns[X.columns.str.contains("inflammation")]
    c_2 = X.columns[X.columns.str.contains("bleed")]
    c_3 = X.columns[X.columns.str.contains("skin")]
    X["c_0"] = X[c_0].sum(axis=1)
    X["c_1"] = X[c_1].sum(axis=1)
    X["c_2"] = X[c_2].sum(axis=1)
    X["c_3"] = X[c_3].sum(axis=1)
    return X