[텍스트마이닝][5주차 2] Embeddings - mingoori0512/minggori GitHub Wiki
Embeddings
- Static embeddings
word2vec(skipgram): 단어 하나에 같은 임베딩
- contextualized word embedding
ELMo, BERT(LM): 문장의 문맥에 따라 다른 임베딩
Contextualized embeddings(질문하기)
- Models for learning static embeddings learn a single representation for a word type.
Type and tokens
-
Type: bears
-
Tokens:
-
The bears ate the honey
-
We spotted the bears from the highway
-
Yosemite has brown bears
-
The chicago bears didn't make they playoffs
-
Contextualized word representations
- Big idea: transform the representation of a token in a sentence(e.g. from a static word embedding) to be a sensitive to its local context in a sentence and trainable to be optimized for a specific NLP task.
BERT
-
[CLS]: classification: 단어를 압축해 놓은 하나의 embedding
-
[SEP]: 문장을 구분하기 위해서
-
특정 단어들을 쪼갬(Word piece): 원형도 이해하고, 과거형도 이해