Skip to content


Reducers specify how to go from many loss values to a single loss value. For example, the ContrastiveLoss computes a loss for every positive and negative pair in a batch. A reducer will take all these per-pair losses, and reduce them to a single value. Here's where reducers fit in this library's flow of filters and computations:

Your Data --> Sampler --> Miner --> Loss --> Reducer --> Final loss value

Reducers are passed into loss functions like this:

from pytorch_metric_learning import losses, reducers
reducer = reducers.SomeReducer()
loss_func = losses.SomeLoss(reducer=reducer)
loss = loss_func(embeddings, labels) # in your training for-loop

Internally, the loss function creates a dictionary that contains the losses and other information. The reducer takes this dictionary, performs the reduction, and returns a single value on which .backward() can be called. Most reducers are written such that they can be passed into any loss function.


This computes the average loss, using only the losses that are greater than 0. For example, if the losses are [0, 2, 0, 3], then this reducer will return 2.5.


This class is equivalent to using ThresholdReducer(low=0). See ThresholdReducer.


All reducers extend this class.



  • collect_stats: If True, will collect various statistics that may be useful to analyze during experiments. If False, these computations will be skipped.


This multiplies each loss by a class weight, and then takes the average.

reducers.ClassWeightedReducer(weights, **kwargs)


  • weights: A tensor of weights, where weights[i] is the weight for the ith class.


This divides each loss by a custom value specified inside the loss function. This is useful if you want to hardcode a reduction behavior in your loss function (i.e. by using DivisorReducer), while still having the option to use other reducers.


To use this reducer, the loss function must include divisor in its loss dictionary. For example, the ProxyAnchorLoss uses DivisorReducer by default, and returns the following dictionary:

loss_dict = {
    "pos_loss": {
        "losses": pos_term.squeeze(0),
        "indices": loss_indices,
        "reduction_type": "element",
        "divisor": len(with_pos_proxies),
    "neg_loss": {
        "losses": neg_term.squeeze(0),
        "indices": loss_indices,
        "reduction_type": "element",
        "divisor": self.num_classes,


This returns its input. In other words, no reduction is performed. The output will be the loss dictionary that is passed into it.



This will return the average of the losses.



This wraps multiple reducers. Each reducer is applied to a different sub-loss, as specified in the host loss function. Then the reducer outputs are summed to obtain the final loss.

reducers.MultipleReducers(reducers, default_reducer=None, **kwargs)


  • reducers: A dictionary mapping from strings to reducers. The strings must match sub-loss names of the host loss function.
  • default_reducer: This reducer will be used for any sub-losses that are not included in the keys of reducers. If None, then MeanReducer() will be the default.

Example usage:

The ContrastiveLoss has two sub-losses: pos_loss for the positive pairs, and neg_loss for the negative pairs. In this example, a ThresholdReducer is used for the pos_loss and a MeanReducer is used for the neg_loss.

from pytorch_metric_learning.losses import ContrastiveLoss
from pytorch_metric_learning.reducers import MultipleReducers, ThresholdReducer, MeanReducer
reducer_dict = {"pos_loss": ThresholdReducer(0.1), "neg_loss": MeanReducer()}
reducer = MultipleReducers(reducer_dict)
loss_func = ContrastiveLoss(reducer=reducer)


This converts unreduced pairs to unreduced elements. For example, NTXentLoss returns losses per positive pair. If you used PerAnchorReducer with NTXentLoss, then the losses per pair would first be converted to losses per batch element, before being passed to the inner reducer. This makes NTXentLoss equivalent to the SupConLoss described in Supervised Contrastive Learning. Note that this reducer currently only works with pair based losses.

def aggregation_func(x, num_per_row):
    zero_denom = num_per_row == 0
    x = torch.sum(x, dim=1) / num_per_row
    x[zero_denom] = 0
    return x



  • reducer: The reducer that will be fed per-element losses. The default is MeanReducer
  • aggregation_func: A function that takes in (x, num_per_row) and returns a loss per row of x. The default is the aggregation_func defined in the code snippet above. It returns the mean per row.
    • x is an NxN array of pairwise losses, where N is the batch size.
    • num_per_row is a size N array which indicates how many non-zero losses there are per-row of x.


This computes the average loss, using only the losses that fall within a specified range.

reducers.ThresholdReducer(low=None, high=None **kwargs)

At least one of low or high must be specified.


  • low: Losses less than this value will be ignored.
  • high: Losses greater than this value will be ignored.


  • ThresholdReducer(low=6): the filter is losses > 6

    • If the losses are [3, 7, 1, 13, 5], then this reducer will return (7+13)/2 = 10.
  • ThresholdReducer(high=6): the filter is losses < 6

    • If the losses are [3, 7, 1, 13, 5], then this reducer will return (1+3+5)/3 = 3.
  • ThresholdReducer(low=6, high=12): the filter is 6 < losses < 12

    • If the losses are [3, 7, 1, 13, 5], then this reducer will return (7)/1 = 7.