[텍스트마이닝][5주차 2] Embeddings - mingoori0512/minggori GitHub Wiki

Embeddings

Static embeddings

word2vec(skipgram): 단어 하나에 같은 임베딩

contextualized word embedding

ELMo, BERT(LM): 문장의 문맥에 따라 다른 임베딩

Contextualized embeddings(질문하기)

Models for learning static embeddings learn a single representation for a word type.

Type and tokens

Type: bears
Tokens:
- The bears ate the honey
- We spotted the bears from the highway
- Yosemite has brown bears
- The chicago bears didn't make they playoffs

Contextualized word representations

Big idea: transform the representation of a token in a sentence(e.g. from a static word embedding) to be a sensitive to its local context in a sentence and trainable to be optimized for a specific NLP task.

BERT

[CLS]: classification: 단어를 압축해 놓은 하나의 embedding
[SEP]: 문장을 구분하기 위해서
특정 단어들을 쪼갬(Word piece): 원형도 이해하고, 과거형도 이해