Introduction
It is a discriminative training method for applications where the number of categories is very large, where the number of samples per category is small, and where only a subset of the categories is known at the time of training, e.g., face recognition and face verification.
The Framework
Given a family of functions G_w(X) parameterized by W, we seek to find a value of the parameter W such that the scalar energy function:
is small if X_1 and X_2 belong to the same category, and large if they belong to different categories.
For G_w(X), we can choose architecture that extracts robust representation, such as a convolutional network.
Contrastive Loss Function
The loss function needs a contrastive term to ensure not only that the energy for a pair of inputs from the same category is low, but also that the energy for a pair from different categories is large.
The total loss function over a data set is given by:
The instance loss function is composed of terms for the similar (y =1) case (L+), and the dissimilar (y = 0) case (L−):
The loss functions for the similar and dissimilar cases are given by:
What is m?