自然语言处理 - yangxudong/yangxudong.github.io GitHub Wiki

FQA

Purpose of L2 normalization for triplet network

Triplet-based distance learning for face recognition seems very effective. I'm curious about one particular aspect of the paper. As part of finding an embedding for a face, the authors normalize the hidden units using L2 normalization, which constrains the representation to be on a hypersphere. Why is that helpful or needed?

The squared Euclidean distance between normalized vectors is proportional to their cosine similarity (ref: wikipedia),

so the advantage of using normalization is more or less the advantage of cosine similarity over Euclidean distance. As mentioned in Andy Jones's answer, without normalization scaling the margin by a factor would just scale the embedding correspondingly.

Another nice property is, with such normalization the value of squared Euclidean distance is guaranteed to be within range [0,4], which saves us much effort from choosing a proper margin parameter 𝛼.

句子语义向量（Sentence Embedding）

论文

Universal Sentence Encoder