自然语言处理 - yangxudong/yangxudong.github.io GitHub Wiki
FQA
Purpose of L2 normalization for triplet network
Triplet-based distance learning for face recognition seems very effective. I'm curious about one particular aspect of the paper. As part of finding an embedding for a face, the authors normalize the hidden units using L2 normalization, which constrains the representation to be on a hypersphere. Why is that helpful or needed?
The squared Euclidean distance between normalized vectors is proportional to their cosine similarity (ref: wikipedia),
so the advantage of using normalization is more or less the advantage of cosine similarity over Euclidean distance. As mentioned in Andy Jones's answer, without normalization scaling the margin by a factor would just scale the embedding correspondingly.
Another nice property is, with such normalization the value of squared Euclidean distance is guaranteed to be within range [0,4], which saves us much effort from choosing a proper margin parameter 𝛼.