Skip to content

spotriver

bike_sharing

spotriver

Home
Code Reference
Code Reference
- spotriver
  spotriver
  - drift
    drift
    
    drift_generator
  - evaluation
    evaluation
    
    eval_bml
    
    eval_nowcast
    
    eval_oml
  - fun
    fun
    
    hyperriver
    
    hyperriver_old
  - hyperdict
    hyperdict
    
    river_hyper_dict
  - plot
    plot
    
    stats
  - preprocess
    preprocess
    
    impute
  - utils
    utils
    
    data_conversion
    
    features
Documentation
Download
Examples
About

bike_sharing

`get_bike_sharing_data(train_size=0.6)` ¶

Fetches the Bike Sharing Demand dataset from OpenML and splits it into training and test sets.

Parameters:

Name	Type	Description	Default
`train_size`	`float`	The proportion of the dataset to include in the training set. Default value: 0.6	`0.6`

Returns:

Type	Description
`tuple`	tuple containing: df (pd.DataFrame): The full dataset. train (pd.DataFrame): The training set. test (pd.DataFrame): The test set.

Examples:

>>> from spotriver.data.bike_sharing import get_bike_sharing_data
>>> df, train, test = get_bike_sharing_data(train_size=0.6)

Source code in spotriver/data/bike_sharing.py

def get_bike_sharing_data(train_size=0.6):
    """
    Fetches the Bike Sharing Demand dataset from OpenML and splits it into training and test sets.

    Args:
        train_size (float):
            The proportion of the dataset to include in the training set. Default value: 0.6

    Returns:
        (tuple): tuple containing:
            df (pd.DataFrame): The full dataset.
            train (pd.DataFrame): The training set.
            test (pd.DataFrame): The test set.

    Examples:
        >>> from spotriver.data.bike_sharing import get_bike_sharing_data
        >>> df, train, test = get_bike_sharing_data(train_size=0.6)
    """

    bike_sharing = fetch_openml("Bike_Sharing_Demand", version=2, as_frame=True, parser="pandas")
    df = bike_sharing.frame
    # Normalize the count column
    df["count"] = df["count"] / df["count"].max()
    # Replace heavy_rain with rain in the weather column
    df["weather"].replace(to_replace="heavy_rain", value="rain", inplace=True)
    n = df.shape[0]
    # Calculate the number of rows in the training set
    k = int(n * train_size)
    # Split the data into training and test sets
    train = df[0:k]
    test = df[k:n]
    return df, train, test