Skip to content

Datasets

Datasets classes give you a way to automatically download a dataset and transform it into a PyTorch dataset.

All implemented datasets have disjoint train-test splits, ideal for benchmarking on image retrieval and one-shot/few-shot classification tasks.

BaseDataset

All dataset classes extend this class and therefore inherit its __init__ parameters.

datasets.base_dataset.BaseDataset(
    root, 
    split="train+test", 
    transform=None, 
    target_transform=None, 
    download=False
)

Parameters:

  • root: The path where the dataset files are saved.
  • split: A string that determines which split of the dataset is loaded.
  • transform: A torchvision.transforms object which will be used on the input images.
  • target_transform: A torchvision.transforms object which will be used on the labels.
  • download: Whether to download the dataset or not. Setting this as False, but not having the dataset on the disk will raise a ValueError.

Required Implementations:

    @abstractmethod
    def download_and_remove():
        raise NotImplementedError

    @abstractmethod
    def generate_split():
        raise NotImplementedError

CUB-200-2011

datasets.CUB(*args, **kwargs)

Defined splits:

  • train - Consists of 5864 examples, taken from classes 1 to 100.
  • test - Consists of 5924 examples, taken from classes 101 to 200.
  • train+test - Consists 11788 of examples, taken from all classes.

Loading different dataset splits

train_dataset = CUB(root="data", 
    split="train", 
    transform=None, 
    target_transform=None, 
    download=True
)
# No need to download the dataset after it is already downladed
test_dataset = CUB(root="data", 
    split="test", 
    transform=None, 
    target_transform=None, 
    download=False
)
train_and_test_dataset = CUB(root="data", 
    split="train+test", 
    transform=None, 
    target_transform=None, 
    download=False
) 

Cars196

datasets.Cars196(*args, **kwargs)

Defined splits:

  • train - Consists of 8054 examples, taken from classes 1 to 99.
  • test - Consists of 8131 examples, taken from classes 99 to 197.
  • train+test - Consists of 16185 examples, taken from all classes.

Loading different dataset splits

train_dataset = Cars196(root="data", 
    split="train", 
    transform=None, 
    target_transform=None, 
    download=True
)
# No need to download the dataset after it is already downladed
test_dataset = Cars196(root="data", 
    split="test", 
    transform=None, 
    target_transform=None, 
    download=False
)
train_and_test_dataset = Cars196(root="data", 
    split="train+test", 
    transform=None, 
    target_transform=None, 
    download=False
) 

INaturalist2018

datasets.INaturalist2018(*args, **kwargs)

Defined splits:

  • train - Consists of 325 846 examples.
  • test - Consists of 136 093 examples.
  • train+test - Consists of 461 939 examples.

Loading different dataset splits

# The download takes a while - the dataset is very large
train_dataset = INaturalist2018(root="data", 
    split="train", 
    transform=None, 
    target_transform=None, 
    download=True
)
# No need to download the dataset after it is already downladed
test_dataset = INaturalist2018(root="data", 
    split="test", 
    transform=None, 
    target_transform=None, 
    download=False
)
train_and_test_dataset = INaturalist2018(root="data", 
    split="train+test", 
    transform=None, 
    target_transform=None, 
    download=False
) 

StanfordOnlineProducts

datasets.StanfordOnlineProducts(*args, **kwargs)

Defined splits:

  • train - Consists of 59551 examples.
  • test - Consists of 60502 examples.
  • train+test - Consists of 120 053 examples.

Loading different dataset splits

# The download takes a while - the dataset is very large
train_dataset = StanfordOnlineProducts(root="data", 
    split="train", 
    transform=None, 
    target_transform=None, 
    download=True
)
# No need to download the dataset after it is already downladed
test_dataset = StanfordOnlineProducts(root="data", 
    split="test", 
    transform=None, 
    target_transform=None, 
    download=False
)
train_and_test_dataset = StanfordOnlineProducts(root="data", 
    split="train+test", 
    transform=None, 
    target_transform=None, 
    download=False
)