Sentiment Analysis - newlife-js/Wiki GitHub Wiki

by ์นด์ด์ŠคํŠธ ์ฐจ๋ฏธ์˜ ๊ต์ˆ˜๋‹˜

๊ฐ์„ฑ ๋ถ„์„(Sentiment Analysis)

Sentiment = Feelings(Attitude, Emotion, Opinion)
์ผ๋ฐ˜์ ์œผ๋กœ binary opposition ์‚ฌ์šฉ(์ฐฌ์„ฑ/๋ฐ˜๋Œ€, ํ˜ธ/๋ถˆํ˜ธ, good/bad)
-> sentiment content, positive/negative valence(๊ธ/๋ถ€์ • ์ •๋„)

โ–  ์‚ฌ์šฉ ์˜ˆ์‹œ

  • Consumer information: product review
  • Marketing: consumer attitudes, trends
  • Politics: predict votes and view
  • Social: find like-minded individuals or communities

์ข…๋ฅ˜

  • Aspect-based SA
  • Multimodal SA
  • Contextual SA
  • Sentiment Reasoning
  • Domain Adaptation
  • Multilingual SA
  • Sarcasm Analysis
  • Sentiment-aware NLG
  • Bias in SA Systems

Topic modeling

๋ฌธ์„œ์˜ ์ˆจ๊ฒจ์ง„ topic์„ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ

Term-document matrix

๋ฌธ์„œ๋ณ„๋กœ term์˜ occurence๋ฅผ matrix๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ
occurence๊ฐ€ ๋†’์€ term์„ ์ฃผ์ œ๋กœ ์„ ํƒ
๋‹ค๋ฅธ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ ๋‹จ์–ด(๋‹ค์˜์–ด)์˜ ๊ฒฝ์šฐ ์˜๋ฏธ๊ฐ€ ๋‹ค๋ฅด๊ฒŒ ์“ฐ์—ฌ๋„ ๊ฐ™์€ ์˜๋ฏธ๋กœ ์ƒ๊ฐ
-> latent meaning์„ ์ด๋Œ์–ด๋‚ด๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”

Latent Semantic Analysis(LSA)

Documnet x Term(TF-IDF) matrix์˜ decomposition(SVD)์„ ์‚ฌ์šฉ

U: document-topic matrix
V^T: topic-term matrix

๋‹จ์ : embedding์„ ํ•ด์„ํ•˜๊ธฐ ์–ด๋ ค์›€(์Œ์ˆ˜๊ฐ€ ์–ด๋–ค ์˜๋ฏธ?), ํฐ ๋ฌธ์„œ set ํ•„์š”, ์ •๊ทœ๋ถ„ํฌ ๊ฐ€์ •์„ ๋งŒ์กฑํ•˜์ง€ ์•Š์Œ

Probabilistic LSA(pLSA)

term์˜ ๋“ฑ์žฅ ํšŸ์ˆ˜๊ฐ€ ์•„๋‹Œ ๋“ฑ์žฅ ํ™•๋ฅ  ๊ธฐ๋ฐ˜
image
image
(D: document, Z: topic, W: word)

Latent Dirichlet Allocation(LDA)

Dirichlet Distribution: ๊ฐ vector์˜ ๋ชจ๋“  ์š”์†Œ๊ฐ€ ์–‘์ˆ˜์ด๋ฉด์„œ sum์ด 1์ธ k(topic์˜ ์ˆ˜)์ฐจ์› vector์˜ ํ™•๋ฅ  ๋ถ„ํฌ

Assumptions
  • ๋ฌธ์„œ๋“ค์€ ํ† ํ”ฝ๋“ค์˜ ํ˜ผํ•ฉ์œผ๋กœ ๊ตฌ์„ฑ
  • ํ† ํ”ฝ๋“ค์€ ํ™•๋ฅ  ๋ถ„ํฌ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๋‹จ์–ด๋“ค์„ ์ƒ์„ฑ image

Evaluation of Topic modeling

  • Log Likelihood
  • Perplexity
  • Topic Coherence: ๊ฐ™์€ topic์—์„œ ํ™•๋ฅ ์ด ๋†’์€ ๋‹จ์–ด๋“ค์ด ์„œ๋กœ ์œ ์‚ฌํ•œ์ง€

Deep Learning Model

ProdLDA

VAE์™€ ๊ฐ™์€ ๋ชจํ‹ฐ๋ธŒ๋กœ topic์„ modelingํ•˜๋Š” ๋ฐฉ๋ฒ•
document(bag of words)๋ฅผ topic(embedding)์œผ๋กœ encodingํ•˜๊ณ , topic์„ ๋‹ค์‹œ document๋กœ decodingํ•ด์„œ
input๊ณผ output์˜ ์ฐจ์ด๊ฐ€ ์ž‘๋„๋ก ํ•˜๋Š” topic์„ ๋งŒ๋“ค๋„๋ก ํ•™์Šต