opm
Office of Policy and Management dataset
The original database is available from CT’s OPM
https://portal.ct.gov/OPM/IGPP/Publications/Real-Estate-Sales-Listing
The data contains 985,862 observations of up to 14 variables.
fetch_opm(*, data_home=None, download_if_missing=True, return_X_y=False, include_numeric=True, include_categorical=False)
¶
Fetch the OPM dataset from the Connecticut Open Data portal.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_home |
str or pathlib.Path, default=None Specify another download and cache folder for the datasets. By default all spotriver data is stored in ‘~/spotriver_data’ subfolders. |
None
|
|
download_if_missing |
bool, default=True If False, raise an IOError if the data is not locally available rather than trying to download the data from the source site. |
True
|
|
return_X_y |
bool, default=False
If True, return |
False
|
|
include_numeric |
bool, default=True If True, include numeric columns in the output. Numeric columns include ‘List Year’, ‘Assessed Value’, ‘Sale Amount’, ‘Sales Ratio’, ‘lat’, ‘lon’, and ‘timestamp_rec’. |
True
|
|
include_categorical |
bool, default=False If True, include categorical columns in the output. Categorical columns include ‘Town’, ‘Address’, ‘Property Type’, ‘Residential Type’, ‘Non Use Code’, ‘Assessor Remarks’, and ‘OPM remarks’. Columns with fewer than 200 unique values will be treated as categorical. |
False
|
Returns:
Name | Type | Description |
---|---|---|
Union[Tuple[DataFrame, Series], DataFrame, Bunch]
|
Bunch or Tuple[pd.DataFrame, pd.Series] or pd.DataFrame: |
|
Union[Tuple[DataFrame, Series], DataFrame, Bunch]
|
If |
|
attributes |
Union[Tuple[DataFrame, Series], DataFrame, Bunch]
|
pd.DataFrame of shape (n_samples, n_features), the feature matrix; |
target |
Union[Tuple[DataFrame, Series], DataFrame, Bunch]
|
pd.Series of shape (n_samples,) The target vector. DESCR : str |
Union[Tuple[DataFrame, Series], DataFrame, Bunch]
|
A short description of the dataset. If |
|
Union[Tuple[DataFrame, Series], DataFrame, Bunch]
|
where |
|
Union[Tuple[DataFrame, Series], DataFrame, Bunch]
|
If only numeric or categorical columns are included in the output, |
|
Union[Tuple[DataFrame, Series], DataFrame, Bunch]
|
return a pd.DataFrame instead of a Bunch. |
Examples:
>>> from spotriver.data import fetch_opm
# Fetch the OPM dataset and return a pandas DataFrame
opm_df = fetch_opm()
# Fetch the OPM dataset, include categorical columns, and return a Bunch object
opm_data = fetch_opm(include_numeric=False, include_categorical=True, return_X_y=False)
# Fetch the OPM dataset, include numeric and categorical columns, and return a tuple of pandas DataFrames
X, y = fetch_opm(include_categorical=True, return_X_y=True)
Source code in spotriver/data/opm.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|