data_conversion
compare_two_tree_models(model1, model2, headers=['Parameter', 'Default', 'Spot'])
¶
Compares two tree models and returns a table of the differences. Args: model1 (Pipeline): A river model pipeline. model2 (Pipeline): A river model pipeline. Returns: (str): A table of the differences between the two models.
Source code in spotriver/utils/data_conversion.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
convert_to_df(dataset, target_column='y', n_total=None)
¶
Converts a river dataset into a pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
Dataset
|
The river dataset to be converted. |
required |
target_column |
str
|
The name of the target column in the resulting DataFrame. Defaults to “y”. |
'y'
|
n_total |
int
|
The number of samples to be converted. If set to None or inf, the full dataset is converted. Defaults to None, i.e, the full dataset is converted. |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
A pandas DataFrame representation of the given dataset. |
Examples:
>>> from river import datasets
from spotriver.utils.data_conversion import convert_to_df
dataset = datasets.TrumpApproval()
target_column = "Approval"
df = convert_to_df(dataset, target_column)
df.rename(columns={
'date': 'ordinal_date',
'Gallup': 'gallup',
'Ipsos': 'ipsos',
'Morning Consult': 'morning_consult',
'Rasmussen': 'rasmussen',
'YouGov': 'you_gov'},
inplace=True)
# Split the data into train and test sets
train = df[:500]
test = df[500:]
Source code in spotriver/utils/data_conversion.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
|
rename_df_to_xy(df, target_column='y')
¶
Renames the columns of a DataFrame to x1, x2, …, xn, y.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The DataFrame to be renamed. |
required |
target_column |
str
|
The name of the target column. Defaults to “y”. |
'y'
|
Returns:
Type | Description |
---|---|
DataFrame
|
The renamed DataFrame. |
Examples:
>>> from spotriver.utils.data_conversion import rename_df_to_xy
df = pd.DataFrame({
"feature1": [1, 2, 3],
"feature2": [4, 5, 6],
"target": [7, 8, 9]
})
>>> df = rename_df_to_xy(df, "target")
>>> print(df)
x1 x2 y
0 1 4 7
1 2 5 8
2 3 6 9
Source code in spotriver/utils/data_conversion.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
split_df(dataset, test_size, seed, stratify, shuffle=True, target_type=None)
¶
Split a pandas DataFrame into a training and a test set.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
DataFrame
|
The input data set. |
required |
test_size |
float
|
The percentage of the data set to be used as test set. If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25. |
required |
target_type |
str
|
The type of the target column. Can be “int”, “float” or None. If None, the type of the target column is not changed. Otherwise, the target column is converted to the specified type. |
None
|
seed |
int
|
The seed for the random number generator. |
required |
stratify |
ArrayLike
|
The array of target values. |
required |
shuffle |
bool
|
Whether or not to shuffle the data before splitting. Defaults to True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
tuple |
tuple
|
The tuple (train, test, n_samples). |
Examples:
>>> from spotriver.utils.data_conversion import split_df
df = pd.DataFrame({
"feature1": [1, 2, 3],
"feature2": [4, 5, 6],
"target": [7, 8, 9]})
train, test, n_samples = split_df(df, 0.2, "int", 42)
Source code in spotriver/utils/data_conversion.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
|