Modified Triplet Loss

Beginning

We'll be looking at how to calculate the full triplet loss as well as a matrix of similarity scores.

Background

This is the original triplet loss function:

\[ \mathcal{L_\mathrm{Original}} = \max{(\mathrm{s}(A,N) -\mathrm{s}(A,P) +\alpha, 0)} \]

It can be improved by including the mean negative and the closest negative, to create a new full loss function. The inputs are the Anchor \(\mathrm{A}\), Positive \(\mathrm{P}\) and Negative \(\mathrm{N}\).

\begin{align} \mathcal{L_\mathrm{1}} &= \max{(mean\_neg -\mathrm{s}(A,P) +\alpha, 0)}\\ \mathcal{L_\mathrm{2}} &= \max{(closest\_neg -\mathrm{s}(A,P) +\alpha, 0)}\\ \mathcal{L_\mathrm{Full}} &= \mathcal{L_\mathrm{1}} + \mathcal{L_\mathrm{2}}\\ \end{align}

Imports

# from pypi
import numpy

Middle

Similarity Scores

The first step is to calculate the matrix of similarity scores using cosine similarity so that you can look up \(\mathrm{s}(A,P)\), \(\mathrm{s}(A,N)\) as needed for the loss formulas.

Two Vectors

First, this is how to calculate the similarity score, using cosine similarity, for 2 vectors.

\[ \mathrm{s}(v_1,v_2) = \mathrm{cosine \ similarity}(v_1,v_2) = \frac{v_1 \cdot v_2}{||v_1||~||v_2||} \]

Similarity score

def cosine_similarity(v1: numpy.ndarray, v2: numpy.ndarray) -> float:
    """Calculates the cosine similarity between two vectors

    Args:
     v1: first vector
     v2: vector to compare to v1

    Returns:
     the cosine similarity between v1 and v2
    """
    numerator = numpy.dot(v1, v2)
    denominator = numpy.sqrt(numpy.dot(v1, v1)) * numpy.sqrt(numpy.dot(v2, v2))
    return numerator / denominator
  • Similar vectors
    v1 = numpy.array([1, 2, 3], dtype=float)
    v2 = numpy.array([1, 2, 3.5])
    
    print(f"cosine similarity : {cosine_similarity(v1, v2):0.4f}")
    
    cosine similarity : 0.9974
    
  • Identical Vectors
    v2 = v1
    print(f"cosine similarity : {cosine_similarity(v1, v2):0.4f}")
    
    cosine similarity : 1.0000
    
  • Opposite Vectors
    v2 = -v1
    print(f"cosine similarity : {cosine_similarity(v1, v2):0.4f}")
    
    cosine similarity : -1.0000
    
  • Dissimilar Vectors
    v2 = numpy.array([0,-42,1])
    print(f"cosine similarity : {cosine_similarity(v1, v2):0.4f}")
    
    cosine similarity : -0.5153
    

Two Batches of Vectors

Now let's look at how to calculate the similarity scores, using cosine similarity, for 2 batches of vectors. These are rows of individual vectors, just like in the example above, but stacked vertically into a matrix. They would look like the image below for a batch size (row count) of 4 and embedding size (column count) of 5.

The data is setup so that \(v_{1\_1}\) and \(v_{2\_1}\) represent duplicate inputs, but they are not duplicates with any other rows in the batch. This means \(v_{1\_1}\) and \(v_{2\_1}\) (green and green) have more similar vectors than say \(v_{1\_1}\) and \(v_{2\_2}\) (green and magenta).

We'll use two different methods for calculating the matrix of similarities from 2 batches of vectors.

The Input data.

v1_1 = numpy.array([1, 2, 3])
v1_2 = numpy.array([9, 8, 7])
v1_3 = numpy.array([-1, -4, -2])
v1_4 = numpy.array([1, -7, 2])
v1 = numpy.vstack([v1_1, v1_2, v1_3, v1_4])
print("v1 :")
print(v1, "\n")
v2_1 = v1_1 + numpy.random.normal(0, 2, 3)  # add some noise to create approximate duplicate
v2_2 = v1_2 + numpy.random.normal(0, 2, 3)
v2_3 = v1_3 + numpy.random.normal(0, 2, 3)
v2_4 = v1_4 + numpy.random.normal(0, 2, 3)
v2 = numpy.vstack([v2_1, v2_2, v2_3, v2_4])
print("v2 :")
print(v2, "\n")
v1 :
[[ 1  2  3]
 [ 9  8  7]
 [-1 -4 -2]
 [ 1 -7  2]] 

v2 :
[[ 1.34263076  1.18510671  1.04373534]
 [ 8.96692933  6.50763316  7.03243982]
 [-3.4497247  -6.08808183 -4.54327564]
 [-0.77144774 -9.08449817  4.4633513 ]] 

For this to work the batch sizes must match.

assert len(v1) == len(v2)

Now let's look at the similarity scores.

  • Option 1 : nested loops and the cosine similarity function
    batch_size, columns = v1.shape
    scores_1 = numpy.zeros([batch_size, batch_size])
    
    rows, columns = scores_1.shape
    
    for row in range(rows):
        for column in range(columns):
            scores_1[row, column] = cosine_similarity(v1[row], v2[column])
    
    print("Option 1 : Loop")
    print(scores_1)
    
    Option 1 : Loop
    [[ 0.88245143  0.87735873 -0.93717609 -0.14613242]
     [ 0.99999485  0.99567656 -0.95998199 -0.34214656]
     [-0.86016573 -0.81584759  0.96484391  0.60584372]
     [-0.31943701 -0.23354642  0.49063636  0.96181686]]
    
  • Option 2 : Vector Normalization and the Dot Product
    def norm(x: numpy.ndarray) -> numpy.ndarray:
        """Normalize x"""
        return x / numpy.sqrt(numpy.sum(x * x, axis=1, keepdims=True))
    
    scores_2 = numpy.dot(norm(v1), norm(v2).T)
    
    print("Option 2 : Vector Norm & dot product")
    print(scores_2)
    
    Option 2 : Vector Norm & dot product
    [[ 0.88245143  0.87735873 -0.93717609 -0.14613242]
     [ 0.99999485  0.99567656 -0.95998199 -0.34214656]
     [-0.86016573 -0.81584759  0.96484391  0.60584372]
     [-0.31943701 -0.23354642  0.49063636  0.96181686]] 
    
    

Check

Let's make sure we get the same answer in both cases.

assert numpy.allclose(scores_1, scores_2)

Hard Negative Mining

Now we'll calculate the mean negative \(mean\_neg\) and the closest negative \(close\_neg\) used in calculating \(\mathcal{L_\mathrm{1}}\) and \(\mathcal{L_\mathrm{2}}\).

\begin{align} \mathcal{L_\mathrm{1}} &= \max{(mean\_neg -\mathrm{s}(A,P) +\alpha, 0)}\\ \mathcal{L_\mathrm{2}} &= \max{(closest\_neg -\mathrm{s}(A,P) +\alpha, 0)}\\ \end{align}

We'll do this using the matrix of similarity scores for a batch size of 4. The diagonal of the matrix contains all the \(\mathrm{s}(A,P)\) values, similarities from duplicate question pairs (aka Positives). This is an important attribute for the calculations to follow.

Mean Negative

mean_neg is the average of the off diagonals, the \(\mathrm{s}(A,N)\) values, for each row.

Closest Negative

closest_neg is the largest off diagonal value, \(\mathrm{s}(A,N)\), that is smaller than the diagonal \(\mathrm{s}(A,P)\) for each row.

We'll start with some hand-made similarity scores.

similarity_scores = numpy.array(
    [
        [0.9, -0.8, 0.3, -0.5],
        [-0.4, 0.5, 0.1, -0.1],
        [0.3, 0.1, -0.4, -0.8],
        [-0.5, -0.2, -0.7, 0.5],
    ]
)

Positives

All the s(A,P) values are similarities from duplicate question pairs (aka Positives). These are along the diagonal.

sim_ap = numpy.diag(similarity_scores)
print("s(A, P) :\n")
print(numpy.diag(sim_ap))
s(A, P) :

[[ 0.9  0.   0.   0. ]
 [ 0.   0.5  0.   0. ]
 [ 0.   0.  -0.4  0. ]
 [ 0.   0.   0.   0.5]]

Negatives

All the s(A,N) values are similarities of the non duplicate question pairs (aka Negatives). These are in the cells not on the diagonal.

sim_an = similarity_scores - numpy.diag(sim_ap)
print("s(A, N) :\n")
print(sim_an)
s(A, N) :

[[ 0.  -0.8  0.3 -0.5]
 [-0.4  0.   0.1 -0.1]
 [ 0.3  0.1  0.  -0.8]
 [-0.5 -0.2 -0.7  0. ]]

Mean negative

This is the average of the s(A,N) values for each row.

batch_size = similarity_scores.shape[0]
mean_neg = numpy.sum(sim_an, axis=1, keepdims=True) / (batch_size - 1)
print("mean_neg :\n")
print(mean_neg)
mean_neg :

[[-0.33333333]
 [-0.13333333]
 [-0.13333333]
 [-0.46666667]]

Closest negative

These are the Max s(A,N) that is <= s(A,P) for each row.

mask_1 = numpy.identity(batch_size) == 1            # mask to exclude the diagonal
mask_2 = sim_an > sim_ap.reshape(batch_size, 1)  # mask to exclude sim_an > sim_ap
mask = mask_1 | mask_2
sim_an_masked = numpy.copy(sim_an)         # create a copy to preserve sim_an
sim_an_masked[mask] = -2

closest_neg = numpy.max(sim_an_masked, axis=1, keepdims=True)
print("Closest Negative :\n")
print(closest_neg)
Closest Negative :

[[ 0.3]
 [ 0.1]
 [-0.8]
 [-0.2]]

The Loss Functions

The last step is to calculate the loss functions.

\begin{align} \mathcal{L_\mathrm{1}} &= \max{(mean\_neg -\mathrm{s}(A,P) +\alpha, 0)}\\ \mathcal{L_\mathrm{2}} &= \max{(closest\_neg -\mathrm{s}(A,P) +\alpha, 0)}\\ \mathcal{L_\mathrm{Full}} &= \mathcal{L_\mathrm{1}} + \mathcal{L_\mathrm{2}}\\ \end{align}

The Alpha margin.

alpha = 0.25

Modified triplet loss

loss_1 = numpy.maximum(mean_neg - sim_ap.reshape(batch_size, 1) + alpha, 0)
loss_2 = numpy.maximum(closest_neg - sim_ap.reshape(batch_size, 1) + alpha, 0)
loss_full = loss_1 + loss_2

Cost

cost = numpy.sum(loss_full)
print("Loss Full :\n")
print(loss_full)
print(f"\ncost : {cost:.3f}")
Loss Full :

[[0.        ]
 [0.        ]
 [0.51666667]
 [0.        ]]

cost : 0.517