manager.features.select_top_poly_features( poly_features, y, max_poly_features=10, random_state=123, n_jobs=-1, mi_sample_size=4000,)
Rank polynomial interaction columns by mutual information, keep the top K.
Polynomial expansion (create_interaction_features with degree >= 2) can emit hundreds or thousands of poly_* columns. This helper caps that set: it scores each candidate column by its mutual information with the target and returns the names of the max_poly_features highest-scoring columns. Mutual information is estimated with mutual_info_regression, seeded by random_state so the selection is reproducible.
The k-nearest-neighbour estimator behind mutual_info_regression is the dominant cost of the whole exogenous-feature pipeline on realistic inputs (thousands of candidate columns over years of hourly data). Two knobs keep it fast: the scoring runs in parallel across candidate columns (n_jobs), and long series are scored on a reproducible row subsample (mi_sample_size) instead of every observation.
Number of parallel jobs forwarded to mutual_info_regression, which scores candidate columns independently. -1 (the default) uses all cores; None runs single-threaded. Parallelism does not change the scores, so the selected columns are identical for every n_jobs value.
Maximum number of rows used for the mutual-information estimate. When the joined frame is longer, a uniform random subsample of this size (drawn without replacement, seeded by random_state) is scored instead — a large speed-up on multi-year hourly series. The subsampled estimate can rank borderline columns differently from a full-data estimate, so the kept set may differ; pass None to score every row (the pre-15.8 behaviour). Must be a positive integer or None. Defaults to 4000.