Datasets and Data Loaders

Ready-to-use datasets for PyTorch-based optimization workflows.

The data subpackage provides PyTorch Dataset wrappers and data loader utilities. These are primarily used for hyperparameter tuning of neural networks with spotoptim.


DiabetesDataset

DiabetesDataset wraps the sklearn diabetes regression dataset as a PyTorch Dataset. It provides 442 samples with 10 features each.

from spotoptim.data import DiabetesDataset

ds = DiabetesDataset()
print(f"Samples : {ds.n_samples}")
print(f"Features: {ds.n_features}")

x, y = ds[0]
print(f"First sample shape: {x.shape}")
print(f"First target shape: {y.shape}")
Samples : 442
Features: 10
First sample shape: torch.Size([10])
First target shape: torch.Size([1])

Data Loaders

get_diabetes_dataloaders() creates train and test DataLoader objects with configurable split, batch size, and optional feature scaling:

from spotoptim.data import get_diabetes_dataloaders

train_loader, test_loader, scaler = get_diabetes_dataloaders(
    test_size=0.2,
    batch_size=32,
    scale_features=True,
    random_state=0,
)

X_batch, y_batch = next(iter(train_loader))
print(f"Train batches : {len(train_loader)}")
print(f"Test batches  : {len(test_loader)}")
print(f"Batch X shape : {X_batch.shape}")
print(f"Batch y shape : {y_batch.shape}")
print(f"Scaler fitted : {scaler is not None}")
Train batches : 12
Test batches  : 3
Batch X shape : torch.Size([32, 10])
Batch y shape : torch.Size([32, 1])
Scaler fitted : True

See Also