Skip to content

bike_sharing

get_bike_sharing_data(train_size=0.6)

Fetches the Bike Sharing Demand dataset from OpenML and splits it into training and test sets.

Parameters:

Name Type Description Default
train_size float

The proportion of the dataset to include in the training set. Default value: 0.6

0.6

Returns:

Type Description
tuple

tuple containing: df (pd.DataFrame): The full dataset. train (pd.DataFrame): The training set. test (pd.DataFrame): The test set.

Examples:

>>> from spotriver.data.bike_sharing import get_bike_sharing_data
>>> df, train, test = get_bike_sharing_data(train_size=0.6)
Source code in spotriver/data/bike_sharing.py
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def get_bike_sharing_data(train_size=0.6):
    """
    Fetches the Bike Sharing Demand dataset from OpenML and splits it into training and test sets.

    Args:
        train_size (float):
            The proportion of the dataset to include in the training set. Default value: 0.6

    Returns:
        (tuple): tuple containing:
            df (pd.DataFrame): The full dataset.
            train (pd.DataFrame): The training set.
            test (pd.DataFrame): The test set.

    Examples:
        >>> from spotriver.data.bike_sharing import get_bike_sharing_data
        >>> df, train, test = get_bike_sharing_data(train_size=0.6)
    """

    bike_sharing = fetch_openml("Bike_Sharing_Demand", version=2, as_frame=True, parser="pandas")
    df = bike_sharing.frame
    # Normalize the count column
    df["count"] = df["count"] / df["count"].max()
    # Replace heavy_rain with rain in the weather column
    df["weather"].replace(to_replace="heavy_rain", value="rain", inplace=True)
    n = df.shape[0]
    # Calculate the number of rows in the training set
    k = int(n * train_size)
    # Split the data into training and test sets
    train = df[0:k]
    test = df[k:n]
    return df, train, test