Skip to content

data

Datasets.

This module contains a collection of datasets for multiple tasks: classification, regression, etc. The data corresponds to popular datasets and are conveniently wrapped to easily iterate over the data in a stream fashion. All datasets have fixed size.

AirlinePassengers

Bases: FileDataset

Monthly number of international airline passengers [1].

The stream contains 144 items and only one single feature, which is the month. The goal is to predict the number of passengers each month by capturing the trend and the seasonality of the data.

Returns:

Type Description
Generator

An iterator over the data in the file.

Note: The code can be used as a template for creating new datasets based on CSV files.

Examples:

>>> from spotriver.data.airline_passengers import AirlinePassengers
    dataset = AirlinePassengers()
    for x, y in dataset.take(5):
        print(x, y)
    {'month': datetime.datetime(1949, 1, 1, 0, 0)} 112
    {'month': datetime.datetime(1949, 2, 1, 0, 0)} 118
    {'month': datetime.datetime(1949, 3, 1, 0, 0)} 132
    {'month': datetime.datetime(1949, 4, 1, 0, 0)} 129
    {'month': datetime.datetime(1949, 5, 1, 0, 0)} 121
References

International airline passengers: monthly totals in thousands. Jan 49 – Dec 60

Source code in spotriver/data/airline_passengers.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
class AirlinePassengers(base.FileDataset):
    """Monthly number of international airline passengers [1].

    The stream contains 144 items and only one single feature, which is the month. The goal is to
    predict the number of passengers each month by capturing the trend and the seasonality of the
    data.

    Returns:
        (Generator): An iterator over the data in the file.

    Note: The code can be used as a template for creating new datasets based on CSV files.

    Examples:
        >>> from spotriver.data.airline_passengers import AirlinePassengers
            dataset = AirlinePassengers()
            for x, y in dataset.take(5):
                print(x, y)
            {'month': datetime.datetime(1949, 1, 1, 0, 0)} 112
            {'month': datetime.datetime(1949, 2, 1, 0, 0)} 118
            {'month': datetime.datetime(1949, 3, 1, 0, 0)} 132
            {'month': datetime.datetime(1949, 4, 1, 0, 0)} 129
            {'month': datetime.datetime(1949, 5, 1, 0, 0)} 121

    References:
        International airline passengers: monthly totals in thousands. Jan 49 – Dec 60
    """

    def __init__(self):
        """Constructor method.

        Returns:
            (NoneType): None

        """
        super().__init__(
            filename="airline-passengers.csv",
            task=base.REG,
            n_features=1,
            n_samples=144,
        )

    def __iter__(self):
        """Iterate over the data.
        Returns:
            (Generator): An iterator over the data in the file.
        """
        return stream.iter_csv(
            self.path,
            target="passengers",
            converters={"passengers": int},
            parse_dates={"month": "%Y-%m"},
        )

__init__()

Constructor method.

Returns:

Type Description
NoneType

None

Source code in spotriver/data/airline_passengers.py
33
34
35
36
37
38
39
40
41
42
43
44
45
def __init__(self):
    """Constructor method.

    Returns:
        (NoneType): None

    """
    super().__init__(
        filename="airline-passengers.csv",
        task=base.REG,
        n_features=1,
        n_samples=144,
    )

__iter__()

Iterate over the data. Returns: (Generator): An iterator over the data in the file.

Source code in spotriver/data/airline_passengers.py
47
48
49
50
51
52
53
54
55
56
57
def __iter__(self):
    """Iterate over the data.
    Returns:
        (Generator): An iterator over the data in the file.
    """
    return stream.iter_csv(
        self.path,
        target="passengers",
        converters={"passengers": int},
        parse_dates={"month": "%Y-%m"},
    )