# Samplers¶

Samplers are just extensions of the torch.utils.data.Sampler class, i.e. they are passed to a PyTorch Dataloader. The purpose of samplers is to determine how batches should be formed. This is also where any offline pair or triplet miners should exist.

## MPerClassSampler¶

At every iteration, this will return `m`

samples per class, assuming that the batch size is a multiple of `m`

. For example, if your dataloader's batch size is 100, and `m`

= 5, then 20 classes with 5 samples each will be returned. Note that if `batch_size`

is not specified, then most batches will have `m`

samples per class, but it's not guaranteed for every batch.

```
samplers.MPerClassSampler(labels, m, batch_size=None, length_before_new_iter=100000)
```

**Parameters**:

**labels**: The list of labels for your dataset, i.e. the labels[x] should be the label of the xth element in your dataset.**m**: The number of samples per class to fetch at every iteration. If a class has less than m samples, then there will be duplicates in the returned batch.**batch_size**: Optional. If specified, then every batch is guaranteed to have`m`

samples per class. There are a few restrictions on this value:`batch_size`

must be a multiple of`m`

`length_before_new_iter >= batch_size`

must be true`m * (number of unique labels) >= batch_size`

must be true

**length_before_new_iter**: How many iterations will pass before a new iterable is created.

## HierarchicalSampler¶

Implementation of the sampler used in Deep Metric Learning to Rank.

It will do the following per batch:

- Randomly select X super classes.
- For each super class, randomly select Y samples from Z classes, such that Y * Z equals the batch size divided by X.

(X, Y, and the batch size are controllable parameters. See below for details.)

This is a BatchSampler, so you should pass it into your dataloader as the `batch_sampler`

parameter.

```
samplers.HierarchicalSampler(
labels,
batch_size,
samples_per_class,
batches_per_super_tuple=4,
super_classes_per_batch=2,
inner_label=0,
outer_label=1,
)
```

**Parameters**:

**labels**: 2D array, where rows correspond to elements, and columns correspond to the hierarchical labels.**batch_size**: because this is a BatchSampler the batch size must be specified.`batch_size`

must be a multiple of`super_classes_per_batch`

and`samples_per_class`

**samples_per_class**: number of samples per class per batch. Corresponds to Y in the above explanation. You can also set this to "all" to use all elements of a class, but this is suitable only for few-shot datasets.**batches_per_super_tuple**: number of batches to create for each tuple of super classes. This affects the length of the iterator returned by the sampler.**super_classes_per_batch**: the number of super classes per batch. Corresponds to X in the above explanation.**inner_label**: column index of`labels`

corresponding to classes.**outer_label**: column index of`labels`

corresponding to super classes.

## TuplesToWeightsSampler¶

This is a simple offline miner. It does the following:

- Take a random subset of the dataset, if you provide
`subset_size`

. - Use a specified miner to mine tuples from the subset dataset.
- Compute weights based on how often each element appears in the mined tuples.
- Randomly sample, using the weights as probabilities.

```
samplers.TuplesToWeightsSampler(model,
miner,
dataset,
subset_size=None,
**tester_kwargs)
```

**Parameters**:

**model**: This model will be used to compute embeddings.**miner**: This miner will find hard tuples from the computed embeddings.**dataset**: The dataset you want to sample from.**subset_size**: Optional. If`None`

, then the entire dataset will be mined, and the iterable will have length`len(dataset)`

. Most likely though, you will run out of memory if you do this. So to avoid that, set`subset_size`

to a number of embeddings that can be passed to the miner without running out of memory. Then a random subset of`dataset`

of size`subset_size`

will be used for mining. The iterable will also have length`subset_size`

.**tester_kwargs**: Any other keyword options will be passed to BaseTester, which is used internally to compute embeddings. This allows you to set things like`dataloader_num_workers`

etc, if you want to.

## FixedSetOfTriplets¶

When initialized, this class creates a fixed set of triplets. This is useful for determining the performance of algorithms in cases where the only ground truth data is a set of triplets.

```
samplers.FixedSetOfTriplets(labels, num_triplets)
```

**Parameters**:

**labels**: The list of labels for your dataset, i.e. the labels[x] should be the label of the xth element in your dataset.**num_triplets**: The number of triplets to create.