w2_wordcloud - steelbear/HMG_Softeer_DE GitHub Wiki

Word cloud

image

Word cloud๋ž€ ๋‹จ์–ด๋“ค์„ ๊ตฌ๋ฆ„์ฒ˜๋Ÿผ ๊ตฐ์ง‘์„ ์ด๋ฃจ๋“ฏ ๋ฐฐ์น˜ํ•œ ์‹œ๊ฐ ์ž๋ฃŒ๋ฅผ ๋งํ•œ๋‹ค. ๋‹จ์–ด์˜ ํฌ๊ธฐ์™€ ์ƒ‰์„ ํ†ตํ•ด ๊ฐ ๋‹จ์–ด๋ณ„์˜ ์ƒ๋Œ€์ ์ธ ํฌ๊ธฐ๋‚˜ ๋น„์œจ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์œ„์˜ ๊ทธ๋ฆผ์€ ๊ธ์ •์ /๋ถ€์ •์  tweet๋“ค์„ ๋ชจ์œผ๊ณ  ๋‹จ์–ด ๋นˆ๋„๋ฅผ ํฐํŠธ ํฌ๊ธฐ๋กœ ๋‚˜ํƒ€๋‚ธ Word Cloud๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ฐ ๊ธ์ •์  ๋˜๋Š” ๋ถ€์ •์  tweet์—์„œ ์–ด๋–ค ๋‹จ์–ด๊ฐ€ ๋งŽ์ด ๋‚˜์˜ค๋Š”์ง€ ํ•œ๋ˆˆ์— ์•Œ ์ˆ˜ ์žˆ๋‹ค.

wordcloud

Word cloud๋ฅผ ์†์‰ฝ๊ฒŒ ๊ทธ๋ฆด ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ฃผ๋Š” ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋‹ค.

pip install wordcloud
  • Github Repository
  • Official Documentation
    • ๊ฐ„๋‹จํ•œ ์‚ฌ์šฉ๋ฒ• ์ •๋„๋Š” ์•Œ ์ˆ˜ ์žˆ์œผ๋‚˜, ๊ทธ๋ฆฌ ์ž์„ธํ•˜๊ฒŒ ๋‚˜์˜ค์ง€ ์•Š๋Š”๋‹ค.

์ฝ”๋“œ

import matplotlib.pyplot as plt
from wordcloud import WordCloud


# WordCloud ์ƒ์„ฑ์‹œ ์ธ์ž๊ฐ’์œผ๋กœ ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋‚˜ ๋ถˆ์šฉ์–ด ๋“ฑ ๋‹ค์–‘ํ•œ ์˜ต์…˜์„ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค
wc = WordCloud()

plt.imshow(wc.generate(text)) # text: str
                              # WordCloud๊ฐ€ ๋ถˆ์šฉ์–ด ์ œ๊ฑฐ๋‚˜ ํ† ํฐํ™” ๋“ฑ์„ ์Šค์Šค๋กœ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—
                              # text์— ์ „์ฒ˜๋ฆฌ๋ฅผ ํ•˜์ง€ ์•Š์•„๋„ ๋œ๋‹ค.
                              # ์ด๋•Œ ์ฒ˜๋ฆฌ๋œ ๋‹จ์–ด ๋นˆ๋„ ์ˆ˜๊ฐ€ ๊ฐ์ฒด์— ์ €์žฅ๋œ๋‹ค

wc.to_file('wordcloud.png') # ์ด๋ฏธ์ง€ ํŒŒ์ผ ์ €์žฅ

word_freq = wc.process_text(text) # text ๋นˆ๋„์ˆ˜ ๊ณ„์‚ฐ + ํ…์ŠคํŠธ ์ „์ฒ˜๋ฆฌ(๋ถˆ์šฉ์–ด ์ œ๊ฑฐ, ๋ณต์ˆ˜ํ˜• ๋ช…์‚ฌ ๋‹จ์ˆ˜ํ™” ๋“ฑ)
plt.imshow(wc.generate_from_frequencies(word_freq)) # word_freq: dict, collections.Counter
                                                    # ๋นˆ๋„์ˆ˜๋กœ word cloud ๊ทธ๋ฆฌ๊ธฐ