Skip to content

generic

GenericData

Bases: GenericFileDataset

A class for handling generic data.

This class inherits from the base.GenericFileDataset class and provides an interface for handling generic data.

Parameters:

Name Type Description Default
filename str

The name of the file containing the data.

required
target str

The name of the target column.

required
n_features int

The number of features in the dataset.

required
n_samples int

The number of samples in the dataset.

required
converters Dict[str, callable]

A dictionary of functions for converting column data.

required
parse_dates List[str]

A list of column names to parse as dates.

required
directory str

The directory where the file is located.

required
task str

The type of task. Default is base.REG for regression.

REG
fraction float

The fraction of the data to use. Default is 1.0 for all data.

1.0

Returns:

Type Description
Generator

An iterator over the data in the file.

Examples:

>>> from spotriver.data.generic import GenericData
    import importlib.resources as pkg_resources
    import spotriver.data as data
    inp_file = pkg_resources.files(data)
    csv_path = str(inp_file.resolve())
    dataset = GenericData(filename="UnivariateData.csv",
                        directory=csv_path,
                        target="Consumption",
                        n_features=1,
                        n_samples=51_706,
                        converters={"Consumption": float},
                        parse_dates={"Time": "%Y-%m-%d %H:%M:%S%z"})
    for x, y in dataset:
        print(x, y)
        break
    {'Time': datetime.datetime(2016, 12, 31, 23, 0, tzinfo=datetime.timezone.utc)} 10951.217
Source code in spotriver/data/generic.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
class GenericData(base.GenericFileDataset):
    """A class for handling generic data.

    This class inherits from the base.GenericFileDataset class and provides an interface for handling generic data.

    Args:
        filename (str): The name of the file containing the data.
        target (str): The name of the target column.
        n_features (int): The number of features in the dataset.
        n_samples (int): The number of samples in the dataset.
        converters (Dict[str, callable]): A dictionary of functions for converting column data.
        parse_dates (List[str]): A list of column names to parse as dates.
        directory (str): The directory where the file is located.
        task (str): The type of task. Default is base.REG for regression.
        fraction (float): The fraction of the data to use. Default is 1.0 for all data.

    Returns:
        (Generator): An iterator over the data in the file.

    Examples:
        >>> from spotriver.data.generic import GenericData
            import importlib.resources as pkg_resources
            import spotriver.data as data
            inp_file = pkg_resources.files(data)
            csv_path = str(inp_file.resolve())
            dataset = GenericData(filename="UnivariateData.csv",
                                directory=csv_path,
                                target="Consumption",
                                n_features=1,
                                n_samples=51_706,
                                converters={"Consumption": float},
                                parse_dates={"Time": "%Y-%m-%d %H:%M:%S%z"})
            for x, y in dataset:
                print(x, y)
                break
            {'Time': datetime.datetime(2016, 12, 31, 23, 0, tzinfo=datetime.timezone.utc)} 10951.217

    """

    def __init__(
        self,
        filename: str,
        target: str,
        n_features: int,
        n_samples: int,
        converters: Dict[str, callable],
        parse_dates: List[str],
        directory: str,
        task: str = base.REG,
        fraction: float = 1.0,
    ):
        super().__init__(
            filename=filename,
            n_features=n_features,
            n_samples=n_samples,
            task=task,
            target=target,
            converters=converters,
            parse_dates=parse_dates,
            directory=directory,
        )
        self.fraction = fraction

    def __iter__(self) -> Union[Dict[str, float], float]:
        return stream.iter_csv(
            self.path,
            target=self.target,
            converters=self.converters,
            parse_dates=self.parse_dates,
            fraction=self.fraction,
            seed=123,
        )