preprocessing._binner
preprocessing._binner
QuantileBinner class for binning data into quantile-based bins.
This module contains the QuantileBinner class which bins data into quantile-based bins using numpy.percentile with optimized performance using numpy.searchsorted.
Classes
| Name | Description |
|---|---|
| QuantileBinner | Bin data into quantile-based bins using numpy.percentile. |
QuantileBinner
preprocessing._binner.QuantileBinner(
n_bins,
method='linear',
subsample=200000,
dtype=np.float64,
random_state=789654,
)Bin data into quantile-based bins using numpy.percentile.
This class is similar to sklearn’s KBinsDiscretizer but optimized for performance using numpy.searchsorted for fast bin assignment. Bin intervals are defined following the convention: bins[i-1] <= x < bins[i]. Values outside the range are clipped to the first or last bin.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| n_bins | int | The number of quantile-based bins to create. Must be >= 2. | required |
| method | str | The method used to compute quantiles, passed to numpy.percentile. Default is ‘linear’. Valid values: “inverse_cdf”, “averaged_inverse_cdf”, “closest_observation”, “interpolated_inverse_cdf”, “hazen”, “weibull”, “linear”, “median_unbiased”, “normal_unbiased”. | 'linear' |
| subsample | int | Maximum number of samples for computing quantiles. If dataset has more samples, a random subset is used. Default 200000. | 200000 |
| dtype | type | Data type for bin indices. Default is numpy.float64. | np.float64 |
| random_state | int | Random seed for subset generation. Default 789654. | 789654 |
Attributes
| Name | Type | Description |
|---|---|---|
| n_bins | int | Number of bins to create. |
| method | str | Quantile computation method. |
| subsample | int | Maximum samples for quantile computation. |
| dtype | type | Data type for bin indices. |
| random_state | int | Random seed. |
| n_bins_ | int | Actual number of bins after fitting (may differ from n_bins if duplicate edges are found). |
| bin_edges_ | np.ndarray | Edges of the bins learned during fitting. |
| internal_edges_ | np.ndarray | Internal edges for optimized bin assignment. |
| intervals_ | dict | Mapping from bin index to (lower, upper) interval bounds. |
Examples
>>> import numpy as np
>>> from spotforecast2.preprocessing import QuantileBinner
>>>
>>> # Basic usage: create 3 quantile bins
>>> X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> binner = QuantileBinner(n_bins=3)
>>> _ = binner.fit(X)
>>> result = binner.transform(np.array([1.5, 5.5, 9.5]))
>>> print(result)
[0. 1. 2.]
>>>
>>> # Check bin intervals
>>> print(binner.n_bins_)
3
>>> assert len(binner.intervals_) == 3
>>>
>>> # Use fit_transform for one-step operation
>>> X2 = np.array([10, 20, 30, 40, 50])
>>> binner2 = QuantileBinner(n_bins=2)
>>> bins = binner2.fit_transform(X2)
>>> print(bins)
[0. 0. 1. 1. 1.]Methods
| Name | Description |
|---|---|
| fit | Learn bin edges based on quantiles from training data. |
| fit_transform | Fit to data, then transform it. |
| get_params | Get parameters of the quantile binner. |
| set_params | Set parameters of the QuantileBinner. |
| transform | Assign new data to learned bins. |
fit
preprocessing._binner.QuantileBinner.fit(X, y=None)Learn bin edges based on quantiles from training data.
Computes quantile-based bin edges using numpy.percentile. If the dataset contains more samples than subsample, a random subset is used. Duplicate edges (which can occur with repeated values) are removed automatically.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| X | np.ndarray | Training data (1D numpy array) for computing quantiles. | required |
| y | object | Ignored. | None |
Returns
| Name | Type | Description |
|---|---|---|
| object | Self for method chaining. |
Raises
| Name | Type | Description |
|---|---|---|
| ValueError | If input data X is empty. |
Examples
>>> import numpy as np
>>> from spotforecast2.preprocessing import QuantileBinner
>>>
>>> # Fit with basic data
>>> X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> binner = QuantileBinner(n_bins=3)
>>> _ = binner.fit(X)
>>> print(binner.n_bins_)
3
>>> print(len(binner.bin_edges_))
4
>>>
>>> # Fit with repeated values (may reduce number of bins)
>>> X_repeated = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3])
>>> binner2 = QuantileBinner(n_bins=5)
>>> _ = binner2.fit(X_repeated)
>>> # n_bins_ may be less than 5 due to duplicates
>>> assert binner2.n_bins_ <= 5fit_transform
preprocessing._binner.QuantileBinner.fit_transform(X, y=None, **fit_params)Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
X : array-like of shape (n_samples, n_features) Input samples.
array-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
**fit_params : dict Additional fit parameters.
Returns
X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params
preprocessing._binner.QuantileBinner.get_params(deep=True)Get parameters of the quantile binner.
Returns
| Name | Type | Description |
|---|---|---|
| dict[str, Any] | Dictionary containing n_bins, method, subsample, dtype, and | |
| dict[str, Any] | random_state parameters. |
Examples
>>> import numpy as np
>>> from spotforecast2.preprocessing import QuantileBinner
>>>
>>> binner = QuantileBinner(n_bins=5, method='median_unbiased', subsample=1000)
>>> params = binner.get_params()
>>> print(params['n_bins'])
5
>>> print(params['method'])
median_unbiased
>>> print(params['subsample'])
1000set_params
preprocessing._binner.QuantileBinner.set_params(**params)Set parameters of the QuantileBinner.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| **params | Any | Parameter names and values to set as keyword arguments. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| self | 'QuantileBinner' | Returns the updated QuantileBinner instance. |
Examples
>>> import numpy as np
>>> from spotforecast2.preprocessing import QuantileBinner
>>>
>>> binner = QuantileBinner(n_bins=3)
>>> print(binner.n_bins)
3
>>> binner.set_params(n_bins=5, method='weibull')
>>> print(binner.n_bins)
5
>>> print(binner.method)
weibulltransform
preprocessing._binner.QuantileBinner.transform(X, y=None)Assign new data to learned bins.
Uses numpy.searchsorted for efficient bin assignment. Values are assigned to bins following the convention: bins[i-1] <= x < bins[i]. Values outside the fitted range are clipped to the first or last bin.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| X | np.ndarray | Data to assign to bins (1D numpy array). | required |
| y | object | Ignored. | None |
Returns
| Name | Type | Description |
|---|---|---|
| np.ndarray | Bin indices as numpy array with dtype specified in init. |
Raises
| Name | Type | Description |
|---|---|---|
NotFittedError |
If fit() has not been called yet. |
Examples
>>> import numpy as np
>>> from spotforecast2.preprocessing import QuantileBinner
>>>
>>> # Fit and transform
>>> X_train = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> binner = QuantileBinner(n_bins=3)
>>> _ = binner.fit(X_train)
>>>
>>> X_test = np.array([1.5, 5.5, 9.5])
>>> result = binner.transform(X_test)
>>> print(result)
[0. 1. 2.]
>>>
>>> # Values outside range are clipped
>>> X_extreme = np.array([0, 100])
>>> result_extreme = binner.transform(X_extreme)
>>> print(result_extreme) # Both clipped to valid bin indices
[0. 2.]