三元组损失(Triplet Loss)是一种用于学习深度神经网络嵌入的损失函数,它的主要目标是确保在我们的嵌入空间中,来自相同类别的样本更接近彼此,而不同类别的样本更远离彼此。三元组损失(Triplet Loss)常在人脸识别、图像检索等需要计算相似度的任务中使用
假设我们已经通过神经网络得到了这三个样本在嵌入空间的位置,分别是 A(锚样本),P(正样本)和 N(负样本)。则三元组损失函数的形式为:
L = max(d(A, P) - d(A, N) + margin, 0)
其中,d(A, P) 和 d(A, N) 分别是锚样本与正样本,锚样本与负样本在嵌入空间的距离,"margin"是一个预设定的阈值,用于控制正样本与负样本之间的差异,我们希望锚样本比与负样本的距离比至少比与正样本的距离大。
我们有三个样本锚样本A, 正样本P, 负样本N。它们分别被一个神经网络映射到一个三维空间,得到的嵌入向量是:
A = [1, 1, 1]P = [1.1, 1.1, 1.1]
N = [2, 2, 2]
Triplet Loss三元组损失函数如下:
def triplet_loss(embedding, targets, margin, norm_feat, hard_mining):
r"""Modified from Tong Xiao's open-reid (https://github.com/Cysu/open-reid).
Related Triplet Loss theory can be found in paper 'In Defense of the Triplet
Loss for Person Re-Identification'."""
if norm_feat:
dist_mat = cosine_dist(embedding, embedding)
dist_mat = euclidean_dist(embedding, embedding)
# For distributed training, gather all features from different process.
# if comm.get_world_size() > 1:
# all_embedding = torch.cat(GatherLayer.apply(embedding), dim=0)
# all_targets = concat_all_gather(targets)
# else:
# all_embedding = embedding
# all_targets = targets
# 获取相似度矩阵dist_mat的行数,即样本数量
N = dist_mat.size(0)
# 创建两个相同大小的矩阵is_pos和is_neg,分别存储样本之间是否属于相同类别(正样本对)及不同类别(负样本对)
is_pos = targets.view(N, 1).expand(N, N).eq(targets.view(N, 1).expand(N, N).t()).float()
is_neg = targets.view(N, 1).expand(N, N).ne(targets.view(N, 1).expand(N, N).t()).float()
if hard_mining:
dist_ap, dist_an = hard_example_mining(dist_mat, is_pos, is_neg)
dist_ap, dist_an = weighted_example_mining(dist_mat, is_pos, is_neg)
y = dist_an.new().resize_as_(dist_an).fill_(1)
if margin > 0:
loss = F.margin_ranking_loss(dist_an, dist_ap, y, margin=margin)
loss = F.soft_margin_loss(dist_an - dist_ap, y)
# fmt: off
if loss == float('Inf'): loss = F.margin_ranking_loss(dist_an, dist_ap, y, margin=0.3)
# fmt: on
return loss
cosine_dist(embedding, embedding)是将embedding中的每一个向量与embedding中的每一个向量都计算一遍余弦距离。
假设你的embedding是一个(3, 2)的张量,内容如下:
[[a1, a2],
[b1, b2],
[c1, c2]]
其中,[a1, a2],[b1, b2]和[c1, c2]是这个embedding中的3个向量。
当你执行cosine_dist(embedding, embedding)时,实际上计算的是:
[[cosine_dist([a1, a2], [a1, a2]), cosine_dist([a1, a2], [b1, b2]), cosine_dist([a1, a2], [c1, c2])],
[cosine_dist([b1, b2], [a1, a2]), cosine_dist([b1, b2], [b1, b2]), cosine_dist([b1, b2], [c1, c2])],
[cosine_dist([c1, c2], [a1, a2]), cosine_dist([c1, c2], [b1, b2]), cosine_dist([c1, c2], [c1, c2])]]
这个结果是一个(3, 3)的矩阵,表示embedding中的每一个向量与embedding中的每一个向量之间的余弦距离。
当if norm_feat:这个条件语句为真时,即当我们想对embedding进行归一化处理时,就会使用这种方法计算embedding中所有向量之间的余弦距离。
N = dist_mat.size(0)
is_pos = targets.view(N, 1).expand(N, N).eq(targets.view(N, 1).expand(N, N).t()).float()
is_neg = targets.view(N, 1).expand(N, N).ne(targets.view(N, 1).expand(N, N).t()).float()
targets.view(N, 1).expand(N, N),得到的结果是:
1 1 1 1
2 2 2 2
1 1 1 1
2 2 2 2
执行targets.view(N, 1).expand(N, N).t(),得到的结果是:
1 2 1 2
1 2 1 2
1 2 1 2
1 2 1 2
1 0 1 0
0 1 0 1
1 0 1 0
0 1 0 1
0 1 0 1
1 0 1 0
0 1 0 1
1 0 1 0
if hard_mining:
dist_ap, dist_an = hard_example_mining(dist_mat, is_pos, is_neg)
dist_ap, dist_an = weighted_example_mining(dist_mat, is_pos, is_neg)
# 对于每个锚点样本,找到最难正样本(最远的具有相同类别标签的样本)和最难负样本(最近的具有不同类别标签的样本)。
def hard_example_mining(dist_mat, is_pos, is_neg):
"""For each anchor, find the hardest positive and negative sample.
dist_mat: pair wise distance between samples, shape [N, M]
is_pos: positive index with shape [N, M]
is_neg: negative index with shape [N, M]
dist_ap: pytorch Variable, distance(anchor, positive); shape [N]
dist_an: pytorch Variable, distance(anchor, negative); shape [N]
p_inds: pytorch LongTensor, with shape [N];
indices of selected hard positive samples; 0 <= p_inds[i] <= N - 1
n_inds: pytorch LongTensor, with shape [N];
indices of selected hard negative samples; 0 <= n_inds[i] <= N - 1
NOTE: Only consider the case in which all labels have same num of samples,
thus we can cope with all anchors in parallel.
assert len(dist_mat.size()) == 2
# `dist_ap` means distance(anchor, positive)
# both `dist_ap` and `relative_p_inds` with shape [N]
# dist_ap表示锚点样本与正样本之间的距离。通过在距离矩阵和正样本矩阵做逐元素相乘后,取每行(每个锚点)的最大值。
dist_ap, _ = torch.max(dist_mat * is_pos, dim=1)
# `dist_an` means distance(anchor, negative)
# both `dist_an` and `relative_n_inds` with shape [N]
# dist_an表示锚点样本与负样本之间的距离。首先,通过在距离矩阵和负样本矩阵做逐元素相乘后,再将正样本矩阵与大数(1e9)相乘并加到上述结果上,旨在将负样本对里的正样本对的距离设置地非常大。之后取每行的最小值,找出与锚点样本最近且类别不同的样本。
dist_an, _ = torch.min(dist_mat * is_neg + is_pos * 1e9, dim=1)
return dist_ap, dist_an
def weighted_example_mining(dist_mat, is_pos, is_neg):
"""For each anchor, find the weighted positive and negative sample.
dist_mat: pytorch Variable, pair wise distance between samples, shape [N, N]
dist_ap: pytorch Variable, distance(anchor, positive); shape [N]
dist_an: pytorch Variable, distance(anchor, negative); shape [N]
assert len(dist_mat.size()) == 2
is_pos = is_pos
is_neg = is_neg
# 对于每个锚点样本,找到正样本和负样本的加权距离
dist_ap = dist_mat * is_pos
dist_an = dist_mat * is_neg
# 分别通过softmax函数计算正样本和负样本的权重,注意负样本在计算权重之前要取负数。
weights_ap = softmax_weights(dist_ap, is_pos)
weights_an = softmax_weights(-dist_an, is_neg)
# 计算的是加权距离,将距离与对应的权重相乘,然后对结果进行累加求和,得到最后的加权距离。
dist_ap = torch.sum(dist_ap * weights_ap, dim=1)
dist_an = torch.sum(dist_an * weights_an, dim=1)
return dist_ap, dist_an
y = dist_an.new().resize_as_(dist_an).fill_(1)
if margin > 0:
loss = F.margin_ranking_loss(dist_an, dist_ap, y, margin=margin)
loss = F.soft_margin_loss(dist_an - dist_ap, y)
# fmt: off
if loss == float('Inf'): loss = F.margin_ranking_loss(dist_an, dist_ap, y, margin=0.3)
# fmt: on