TEXT ANALYSIS - fingeredman/teanaps GitHub Wiki
TEANAPS
API Documentation
Text Anlaysis
teanaps.text_analysis
3. teanaps.text_analysis.TfidfCalculator
3.1. Python Code (in Jupyter Notebook) :
from teanaps.text_analysis import TfidfCalculator tfidf = TfidfCalculator()
-
teanaps.text_analysis.TfidfCalculator.calculation_tfidf(document_list)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์์์ ๋จ์ด์ TF-IDF ๊ฐ์ ๊ณ์ฐํฉ๋๋ค.
-
Parameters
- document_list (list) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Returns
- None
-
Examples
Python Code (in Jupyter Notebook) :
#tokenized_sentence_list = ['๋นํธ์ฝ์ธ ๊ฐ๋ฅ์ฑ ๊ฒฐํจ ์ ์ ๊ท์ ์ ๋น', # '์๋ ๋ฒํ ๋นํธ์ฝ์ธ ๋ง์ ํ๋ฒ๋๋ ๊ต์ ๋ง์', # '๊ฐ์ ํํ ์ธํฐ๋ท ์์ค ๋นํธ์ฝ์ธ ์บ์ ์ ๋ ์ฃผ์ ๊ฒ', # ..., # '์์ฐ ํฌ๋ถ ํต์ฐ ๋์ ์์ฐ ๋ถ๋์ฐ ์ ํ ์ ์๋น ์น์ธ', # 'ํ๊ตญ ๋ถ์ ๋ถ๋์ฐ ์นจ์ฒด ๋ถ๋์ฐ ์', # '๊ธํฌํ ๋ถ๋์ฐ ํฌ์์ ์ธ๋ ฅ ๊ณผ์ ๊ธฐ ๊ฐ์ค'] # ] tfidf.calculation_tfidf(tokenized_sentence_list)
-
-
teanaps.text_analysis.TfidfCalculator.get_tf_matrix()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์๋ณ ๋จ์ด์ TF (Term Frequency) ๊ฐ์ด ์ ์ฅ๋ DataFrame์ ๋ฐํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- Pandas DataFrame (dataframe) : ๋ฌธ์๋ณ ๋จ์ด์ TF ๊ฐ์ด ์ ์ฅ๋ DataFrame.
-
Examples
Python Code (in Jupyter Notebook) :
#tfidf.calculation_tfidf(tokenized_sentence_list) result = tfidf.get_tf_matrix() print(type(result))
Output (in Jupyter Notebook) :
pandas.core.frame.DataFrame
-
-
teanaps.text_analysis.TfidfCalculator.get_tf_vector(sentence)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋ฌธ์ฅ์ TF (Term Frequency) ๊ฐ์ผ๋ก ๊ตฌ์ฑ๋ ๋ฒกํฐ ๋ฆฌ์คํธ๋ก ๋ฐํํฉ๋๋ค.
-
Parameters
- sentence (str) : ํ๊ตญ์ด ๋๋ ์์ด๋ก ๊ตฌ์ฑ๋ ๋ฌธ์ฅ. ์ต๋ 128์.
-
Returns
- result (list) : TF (Term Frequency) ๊ฐ์ผ๋ก ๊ตฌ์ฑ๋ ๋ฒกํฐ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
#tfidf.calculation_tfidf(tokenized_sentence_list) tokenized_sentence = "๋นํธ์ฝ์ธ ๊ฐ๋ฅ์ฑ ๊ฒฐํจ ์ ์ ๊ท์ ์ ๋น" result = tfidf.get_tf_vector(sentence) print(result)
Output (in Jupyter Notebook) :
[0, 0, 1, 0, 0, 0, 0, 0, 0, ...]
-
-
teanaps.text_analysis.TfidfCalculator.get_tfidf_matrix()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์๋ณ ๋จ์ด์ TF-IDF ๊ฐ์ด ์ ์ฅ๋ DataFrame์ ๋ฐํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- Pandas DataFrame (dataframe) : ๋ฌธ์๋ณ ๋จ์ด์ TF-IDF ๊ฐ์ด ์ ์ฅ๋ DataFrame.
-
Examples
Python Code (in Jupyter Notebook) :
#tfidf.calculation_tfidf(tokenized_sentence_list) result = tfidf.get_tfidf_matrix() print(type(result))
Output (in Jupyter Notebook) :
pandas.core.frame.DataFrame
-
-
teanaps.text_analysis.TfidfCalculator.get_tfidf_vector(sentence)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋ฌธ์ฅ์ TF-IDF ๊ฐ์ผ๋ก ๊ตฌ์ฑ๋ ๋ฒกํฐ ๋ฆฌ์คํธ๋ก ๋ฐํํฉ๋๋ค.
-
Parameters
- sentence (str) : ํ๊ตญ์ด ๋๋ ์์ด๋ก ๊ตฌ์ฑ๋ ๋ฌธ์ฅ. ์ต๋ 128์.
-
Returns
- result (list) : TF-IDF ๊ฐ์ผ๋ก ๊ตฌ์ฑ๋ ๋ฒกํฐ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
#tfidf.calculation_tfidf(tokenized_sentence_list) tokenized_sentence = "๋นํธ์ฝ์ธ ๊ฐ๋ฅ์ฑ ๊ฒฐํจ ์ ์ ๊ท์ ์ ๋น" result = tfidf.get_tfidf_vector(sentence) print(result)
Output (in Jupyter Notebook) :
[0., 0., 0.45665731, 0., 0., ...]
-
-
teanaps.text_analysis.TfidfCalculator.get_tf_dict()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์ ์ฒด ๋ฌธ์์์ ๋จ์ด์ TF ๊ฐ์ด ์ ์ฅ๋ ๋์ ๋๋ฆฌ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- result (dict) : ๋จ์ด๋ณ TF ๊ฐ์ด ์ ์ฅ๋ ๋์ ๋๋ฆฌ.
-
Examples
Python Code (in Jupyter Notebook) :
#tfidf.calculation_tfidf(tokenized_sentence_list) result = tfidf.get_tf_dict() print(result)
Output (in Jupyter Notebook) :
{'๊ฐ๊ฒฉ': 3, '๊ฐ๋ฅ': 1, '๊ฐ๋ฅ์ฑ': 1, ..., 'ํจ๊ณผ': 2, 'ํ๋ฆ': 1, 'ํก์': 1 }
-
-
teanaps.text_analysis.TfidfCalculator.get_tf_list()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์ ์ฒด ๋ฌธ์์์ ๋จ์ด์ TF ๊ฐ์ด ์ ์ฅ๋ ๋ฆฌ์คํธ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- result (list) : ๋จ์ด๋ณ TF ๊ฐ์ด ์ ์ฅ๋ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
#tfidf.calculation_tfidf(tokenized_sentence_list) result = tfidf.get_tf_list() print(result)
Output (in Jupyter Notebook) :
[['๊ธ๋ฆฌ', 40], ['๋ถ๋์ฐ', 34], ['๊ธ์ต', 34], ..., ['์ถ์ํ', 1], ['์์คํ ', 1], ['ํก์', 1] ]
-
-
teanaps.text_analysis.TfidfCalculator.get_tfidf_dict()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์ ์ฒด ๋ฌธ์์์ ๋จ์ด์ TF-IDF ๊ฐ์ด ์ ์ฅ๋ ๋์ ๋๋ฆฌ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- result (dict) : ๋จ์ด๋ณ TF-IDF ๊ฐ์ด ์ ์ฅ๋ ๋์ ๋๋ฆฌ.
-
Examples
Python Code (in Jupyter Notebook) :
#tfidf.calculation_tfidf(tokenized_sentence_list) result = tfidf.get_tfidf_dict() print(result)
Output (in Jupyter Notebook) :
{'๊ฐ๊ฒฉ': 1.1424359882788366, '๊ฐ๋ฅ': 0.509179564909753, '๊ฐ๋ฅ์ฑ': 0.45665731260262726, ..., 'ํจ๊ณผ': 1.0165526804723384, 'ํ๋ฆ': 0.473637588408657, 'ํก์': 0.5177879851919405} }
-
-
teanaps.text_analysis.TfidfCalculator.get_tfidf_list()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์ ์ฒด ๋ฌธ์์์ ๋จ์ด์ TF-IDF ๊ฐ์ด ์ ์ฅ๋ ๋ฆฌ์คํธ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- result (list) : ๋จ์ด๋ณ TF-IDF ๊ฐ์ด ์ ์ฅ๋ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
#tfidf.calculation_tfidf(tokenized_sentence_list) result = tfidf.get_tfidf_list() print(result)
Output (in Jupyter Notebook) :
[['๊ธ๋ฆฌ', 9.231975802297294], ['๊ธ์ต', 7.963616858955622], ['๋ถ๋์ฐ', 7.727053435662074], ..., ['๋ฐ์ดํฐ', 0.3291698807669874], ['๊ฑฐ๋์', 0.3291698807669874], ['ํฌ๋ช ', 0.3291698807669874] ]
-
-
teanaps.text_analysis.TfidfCalculator.get_word_list()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์ ์ฒด ๋ฌธ์์ ํฌํจ๋ ๋จ์ด ๋ฆฌ์คํธ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- result (list) : ์ ์ฒด ๋ฌธ์์ ํฌํจ๋ ๋จ์ด ๋ฆฌ์คํธ๊ฐ ์ ์ฅ๋ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
#tfidf.calculation_tfidf(tokenized_sentence_list) result = tfidf.get_word_list() print(result)
Output (in Jupyter Notebook) :
['๊ธ๋ฆฌ', '๊ธ์ต', '๋ถ๋์ฐ', ..., '๋ฐ์ดํฐ', '๊ฑฐ๋์', 'ํฌ๋ช ' ]
-
-
teanaps.text_analysis.TfidfCalculator.draw_tfidf(max_words=100)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์ ์ฒด ๋ฌธ์์์ ๋จ์ด์ TF, TF-IDF ๊ฐ์ ํํํ ๊ทธ๋ํ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- max_words (int) : TF, TF-IDF ๊ฐ์ ํํํ ๋จ์ด ๊ฐ์ (TF-IDF ์์ ๊ธฐ์ค).
-
Returns
- plotly graph (graph object) : TF, TF-IDF ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
#tfidf.calculation_tfidf(tokenized_sentence_list) tfidf.draw_tfidf(100)
Output (in Jupyter Notebook) :
-
-
teanaps.text_analysis.TfidfCalculator.get_wordcloud(weight_dict, mask_path=None)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋จ์ด์ TF, TF-IDF ๊ฐ์ ํํํ ์๋ํด๋ผ์ฐ๋๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- weight_dict (dict) : TF, TF-IDF ๊ฐ์ ์ ์ฅํ ๋์
๋๋ฆฌ.
teanaps.text_analysis.TfidfCalculator.get_tf_dict()
,teanaps.text_analysis.TfidfCalculator.get_tfidf_dict()
์ฐธ๊ณ . - mask_path (str) : ์๋ํด๋ผ์ฐ๋ ๋ชจ์/์์์ ๋ณ๊ฒฝํ ์ํ ๋ฐฐ๊ฒฝ์ด๋ฏธ์ง ํ์ผ ๊ฒฝ๋ก. ์ด๋ฏธ์ง ํ์ผ์
PNG(*.png)
๋๋JPEG(*.jpeg)
ํ์ ์ง์.
- weight_dict (dict) : TF, TF-IDF ๊ฐ์ ์ ์ฅํ ๋์
๋๋ฆฌ.
-
Returns
- figure (matplotlib.pyplot.plt) : ์๋ํด๋ฆฌ์ฐ๋.
-
Examples
Python Code (in Jupyter Notebook) :
#tfidf.calculation_tfidf(tokenized_sentence_list) result = tfidf.get_tf_dict() #result = tfidf.get_tfidf_dict() tfidf.get_wordcloud(result)
Output (in Jupyter Notebook) :
-
teanaps.text_analysis.DocumentClustering
3.2. Python Code (in Jupyter Notebook) :
from teanaps.text_analysis import DocumentClustering dc = DocumentClustering()
-
teanaps.text_analysis.DocumentClustering.clustering(alg, document_list, num_cluters=3, max_iterations=300, eps=0.5, min_samples=5)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์๋ฅผ ๊ตฐ์งํํ์ฌ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- alg (str) : ํด๋ฌ์คํฐ๋ง ์๊ณ ๋ฆฌ์ฆ. {"kmeans", "dbscan", "hdbscan"} ์ค ํ๋.
- document_list (list) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- num_cluters (int) : ์์ฑํ ๊ตฐ์ง์ ๊ฐ์.
- max_iterations (int) : ๊ตฐ์งํ๋ฅผ ๋ฐ๋ณตํด์ ์ํํ ํ์.
- eps (float) : DBSCAN ์๊ณ ๋ฆฌ์ฆ ํ์ดํผํ๋ผ๋ฏธํฐ.
- min_samples (int) : ํด๋ฌ์คํฐ์ ํฌํจํ ์ต์ ๋ฐ์ดํฐ ๊ฐ์.
-
Returns
- result (dict) : ๊ตฐ์ง์ Inertia ๊ฐ๊ณผ ๋ฌธ์๋ณ ๊ตฐ์ง ๋ ์ด๋ธ์ ํฌํจํ๋ ๋์ ๋๋ฆฌ.
-
Examples
Python Code (in Jupyter Notebook) :
result = dc.clustering("kmeans", tokenized_sentence_list, num_cluters=3, max_iterations=300) print(result)
Output (in Jupyter Notebook) :
{'inertia': 64.11752014008104, 'predict_list': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)}
Python Code (in Jupyter Notebook) :
result = dc.clustering("dbscan", tokenized_sentence_list, eps=0.5, min_samples=5) print(result)
Output (in Jupyter Notebook) :
{'inertia': 64.11752014008104, 'predict_list': array([-1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, -1, 1, 1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -1, -1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)}
Python Code (in Jupyter Notebook) :
result = dc.clustering("hdbscan", tokenized_sentence_list, min_samples=5) print(result)
Output (in Jupyter Notebook) :
{'inertia': 64.11752014008104, 'predict_list': array([-1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, -1, 1, 1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -1, -1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)}
-
-
teanaps.text_analysis.DocumentClustering.kmeans_inertia_transition(document_list, max_cluters, max_iterations)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
K-MEANS ์๊ณ ๋ฆฌ์ฆ์ ๊ตฐ์ง ๊ฐ์๋ฅผ ๋ณ๊ฒฝํ๋ฉฐ ๋ฌธ์๋ฅผ ๊ตฐ์งํํ๊ณ ๊ฐ๊ฐ์ Inertial ๊ฐ์ ๋ฐํํฉ๋๋ค.
-
Parameters
- document_list (list) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- max_cluters (int) : Intertia ๊ฐ์ ๊ณ์ฐํ ์ต๋ ๊ตฐ์ง์ ๊ฐ์.
- max_iterations (int) : ๊ตฐ์งํ๋ฅผ ๋ฐ๋ณตํด์ ์ํํ ํ์.
-
Returns
- result (dict) : ๊ตฐ์ง์ Inertia ๊ฐ๊ณผ ๋ฌธ์๋ณ ๊ตฐ์ง ๋ ์ด๋ธ์ ํฌํจํ๋ ๋์ ๋๋ฆฌ.
-
Examples
Python Code (in Jupyter Notebook) :
result = dc.kmeans_inertia_transition(tokenized_sentence_list, 10, 300) print(result)
Output (in Jupyter Notebook) :
[85.29314909321171, 73.22892942068657, 64.11752014008104, 60.672117244161946, 57.24561408281322, 55.125181445741525, 53.74440369290694, 52.262356865901175, 50.26148838373041, 48.480517037436805]
-
-
teanaps.text_analysis.DocumentClustering.get_kmeans_inertia_transition_graph(inertia_list)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๊ฐ๊ฐ์ Inertial ๊ฐ์ ํํํ ๊ทธ๋ํ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- inertia_list (list) : Inertial ๊ฐ์ด ํฌํจ๋ ๋ฆฌ์คํธ.
-
Returns
- plotly graph (graph object) : ๊ฐ๊ฐ์ ๊ตฐ์ง์ ๊ฐ์๋ณ Inertial ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
inertia_list = dc.kmeans_inertia_transition(tokenized_sentence_list, 10, 300) dc.get_kmeans_inertia_transition_graph(inertia_list)
Output (in Jupyter Notebook) :
-
-
teanaps.text_analysis.DocumentClustering.get_tfidf_tsne(document_list, predict_list, df_article)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๊ฐ ๋ฌธ์์ ๋ ์ด๋ธ๊ณผ ๊ตฐ์ง, ๊ทธ๋ฆฌ๊ณ ๋ฌธ์๋ฅผ TF-IDF ์๋ฒ ๋ฉํ์ฌ ์ฐจ์์ถ์ํ 2์ฐจ์ ์ขํ๋ฅผ ํฌํจํ๋ DataFrame์ ๋ฐํํฉ๋๋ค.
-
Parameters
- document_list (str) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- predict_list (str) : ๊ตฐ์งํ ๊ฒฐ๊ณผ ๋ ์ด๋ธ์ ํฌํจํ๋ ๋ฆฌ์คํธ.
- df_article (str) : ๋ฌธ์์ ๋ ์ด๋ธ์ ํฌํจํ๋ DataFrame.
-
Returns
- Pandas DataFrame (dataframe) : ๊ฐ ๋ฌธ์์ ๋ ์ด๋ธ๊ณผ ๊ตฐ์ง, ๊ทธ๋ฆฌ๊ณ ๋ฌธ์๋ฅผ TF-IDF ์๋ฒ ๋ฉํ์ฌ ์ฐจ์์ถ์ํ 2์ฐจ์ ์ขํ๋ฅผ ํฌํจํ๋ DataFrame.
-
Examples
Python Code (in Jupyter Notebook) :
import pandas as pd clustering_result = dc.kmeans_clustering(tokenized_sentence_list, 3, 300) predict_list = clustering_result["predict_list"] df_article = pd.DataFrame(tokenized_sentence_list, columns = ["label", "source", "datetime", "title", "content"]) df_result = dc.get_tfidf_tsne(tokenized_sentence_list, predict_list, df_article) print(type(df_result))
Output (in Jupyter Notebook) :
pandas.core.frame.DataFrame
-
-
teanaps.text_analysis.DocumentClustering.get_cluster_graph(df_tfidf_tsne, label_type)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๊ตฐ์งํ ๊ฒฐ๊ณผ๋ฅผ 2์ฐจ์์ผ๋ก ํํํ ๊ทธ๋ํ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- df_tfidf_tsne (DataFrame) : ๊ฐ ๋ฌธ์์ ๋ ์ด๋ธ๊ณผ ๊ตฐ์ง, ๊ทธ๋ฆฌ๊ณ ๋ฌธ์๋ฅผ TF-IDF ์๋ฒ ๋ฉํ์ฌ ์ฐจ์์ถ์ํ 2์ฐจ์ ์ขํ๋ฅผ ํฌํจํ๋ DataFrame.
- label_type (str) : ๊ทธ๋ํ์ ํ์ํ ๋ ์ด๋ธ ์ ํ. {"predict", "labe"} ์ค ํ๋.
-
Returns
- plotly graph (graph object) : ๊ตฐ์งํ ๊ฒฐ๊ณผ ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
dc. get_cluster_graph(df_result, "predict")
Output (in Jupyter Notebook) :
Python Code (in Jupyter Notebook) :
dc. get_cluster_graph(df_result, "label")
Output (in Jupyter Notebook) :
-
-
teanaps.text_analysis.DocumentClustering.get_silhouette_score2(document_list, df_result)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๊ตฐ์งํ ๊ฒฐ๊ณผ์ ๋ํ ์ค๋ฃจ์ฃ ์ค์ฝ์ด๋ฅผ ๊ณ์ฐํ๊ณ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- document_list (list) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- df_result (DataFrame) : ๊ฐ ๋ฌธ์์ ๋ ์ด๋ธ๊ณผ ๊ตฐ์ง, ๊ทธ๋ฆฌ๊ณ ๋ฌธ์๋ฅผ TF-IDF ์๋ฒ ๋ฉํ์ฌ ์ฐจ์์ถ์ํ 2์ฐจ์ ์ขํ๋ฅผ ํฌํจํ๋ DataFrame.
-
Returns
- result (float) : ๊ตฐ์งํ ๊ฒฐ๊ณผ์ ๋ํ ์ค๋ฃจ์ฃ ์ค์ฝ์ด.
-
Examples
Python Code (in Jupyter Notebook) :
result = dc.get_silhouette_score2(tokenized_sentence_list, df_result) print(result)
Output (in Jupyter Notebook) :
0.1772473694643886
-
-
teanaps.text_analysis.DocumentClustering.get_silhouette_graph2(document_list, df_result)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๊ตฐ์งํ ๊ฒฐ๊ณผ์ ๋ํ ์ค๋ฃจ์ฃ ๊ทธ๋ํ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- document_list (str) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- df_result (DataFrame) : ๊ฐ ๋ฌธ์์ ๋ ์ด๋ธ๊ณผ ๊ตฐ์ง, ๊ทธ๋ฆฌ๊ณ ๋ฌธ์๋ฅผ TF-IDF ์๋ฒ ๋ฉํ์ฌ ์ฐจ์์ถ์ํ 2์ฐจ์ ์ขํ๋ฅผ ํฌํจํ๋ DataFrame.
- num_cluters (int) : ์์ฑํ ๊ตฐ์ง์ ๊ฐ์.
-
Returns
- plotly graph (graph object) : ๊ตฐ์งํ ๊ฒฐ๊ณผ์ ์ค๋ฃจ์ฃ ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
dc.get_silhouette_graph2(tokenized_sentence_list, df_result)
Output (in Jupyter Notebook) :
-
-
teanaps.text_analysis.DocumentClustering.get_pair_wize_matrix(document_list)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๊ฐ ๋ฌธ์๊ฐ์ ์ ์ฌ๋๋ฅผ ๋งคํธ๋ฆญ์ค๋ก ํํํ ๊ทธ๋ํ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- document_list (str) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Returns
- plotly graph (graph object) : ๊ฐ ๋ฌธ์๊ฐ์ ์ ์ฌ๋๋ฅผ ๋งคํธ๋ฆญ์ค๋ก ํํํ ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
dc.get_pair_wize_matrix(tokenized_sentence_list)
Output (in Jupyter Notebook) :
-
teanaps.text_analysis.TopicClustering
3.3. Python Code (in Jupyter Notebook) :
from teanaps.text_analysis import TopicClustering tc = TopicClustering()
-
teanaps.text_analysis.TopicClustering.topic_modeling(modeling_type, document_list, topic_count, keyword_count)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์์์ N๊ฐ์ ์ฃผ์ ์ ๋ํด ๊ฐ ๊ตฐ์ง๋ณ ํค์๋๋ฅผ ๊ตฐ์งํํ๊ณ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- modeling_type (str) : ์ฃผ์ ๊ตฐ์งํ ์๊ณ ๋ฆฌ์ฆ ์ ํ. {"lsa", "lda", "hdp"} ์ค ํ๋.
- document_list (list) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- topic_count (int) : ์ฃผ์ ๊ตฐ์ง ๊ฐ์.
- keyword_count (int) : ๊ฐ ์ฃผ์ ๊ตฐ์ง๋ณ ํค์๋ ๊ฐ์.
-
Returns
- result (list) : ๊ฐ ๊ตฐ์ง๋ณ ํค์๋๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
result = tc.topic_modeling("lda", tokenized_sentence_list, 3, 5) print(result)
Output (in Jupyter Notebook) :
[(0, [('๊ธ๋ฆฌ', 0.6542361201136048), ('๋์ถ', 0.4330323607960353), ('๊ธ์ต', 0.3083228589169829), ('์ํ', 0.22088983702295698), ('์ฝํฝ์ค', 0.173373240489713)]), (1, [('๋นํธ์ฝ์ธ', 0.6987330564487386), ('๊ธ๋ฆฌ', -0.25924223777122957), ('ํํ', 0.218391247175097), ('๊ธ์ต', 0.20393479642923928), ('์ํธ', 0.18284477353567058)]), (2, [('๋ถ๋์ฐ', -0.6584326085475736), ('๊ธ์ต', -0.40842310832729234), ('๋นํธ์ฝ์ธ', 0.36212229767170806), ('๊ธ๋ฆฌ', 0.19995317435138174), ('์ ํ', -0.18356626669622753)]) ]
-
-
teanaps.text_analysis.TopicClustering.get_model_validation_result()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์ฃผ์ ๊ตฐ์งํ ๊ฒฐ๊ณผ์ ๋ํด Perplexity, Coherence ๊ฐ์ ๊ณ์ฐํ๊ณ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- result (tuple) : ์ฃผ์ ๊ตฐ์ง์ Perplexity, Coherence ๊ฐ์ ์ ์ฅํ Tuple.
-
Examples
Python Code (in Jupyter Notebook) :
#result = tc.topic_modeling("lda", tokenized_sentence_list, 3, 5) perplexity, coherence = tc.get_model_validation_result() print(perplexity) print(coherence)
Output (in Jupyter Notebook) :
-6.633342221630287 0.5127122691849578
-
-
teanaps.text_analysis.TopicClustering.get_model()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์ฃผ์ ๊ตฐ์งํ ๊ฒฐ๊ณผ ์์ฑ๋ ๋ชจ๋ธ์ ๋ฐํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- result (model) : ์ฃผ์ ๊ตฐ์ง ๊ฒฐ๊ณผ ๋ชจ๋ธ.
-
Examples
Python Code (in Jupyter Notebook) :
#result = tc.topic_modeling("lda", tokenized_sentence_list, 3, 5) model = tc.get_model()
-
-
teanaps.text_analysis.TopicClustering.display_model_resul(model)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์ฃผ์ ๊ตฐ์งํ ๊ฒฐ๊ณผ๋ฅผ ์๊ฐํํ์ฌ ํํํฉ๋๋ค.
-
Parameters
-
- model (model) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.*
-
-
Returns
- result (IPython.core.display.HTML) : ์ฃผ์ ๊ตฐ์ง ์๊ฐํ ๊ฒฐ๊ณผ.
-
Examples
Python Code (in Jupyter Notebook) :
#result = tc.topic_modeling("lda", tokenized_sentence_list, 3, 5) model = tc.get_model() tc.display_model_result(model)
Output (in Jupyter Notebook) :
-
-
teanaps.text_analysis.TopicClustering.get_topics_sentences(document_list)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์ฃผ์ ๊ตฐ์งํ ๊ฒฐ๊ณผ์ ๋ํด ๊ฐ ์ฃผ์ ์ ํด๋นํ๋ ๋ฌธ์๋ฅผ ์ฐพ์๋ด๊ณ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- document_list (str) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Returns
- Pandas DataFrame (dataframe) : ๊ฐ ์ฃผ์ ์ ํด๋นํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ DataFrame.
-
Examples
Python Code (in Jupyter Notebook) :
#result = tc.topic_modeling("lda", tokenized_sentence_list, 3, 5) result = tc.get_topics_sentences(tokenized_sentence_list) print(type(result))
Output (in Jupyter Notebook) :
pandas.core.frame.DataFrame
-
-
teanaps.text_analysis.TopicClustering.get_model_validation_graph(modeling_type, document_list, max_topic_count)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์ฃผ์ ์ ๊ฐ์๋ณ ์ฃผ์ ๊ตฐ์งํ ๊ฒฐ๊ณผ์ ๋ํด Perplexity, Coherence ๊ฐ์ ๊ณ์ฐํ๊ณ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ผ์ธ๊ทธ๋ํ๋ก ํํํฉ๋๋ค.
-
Parameters
- modeling_type (str) : ์ฃผ์ ๊ตฐ์งํ ์๊ณ ๋ฆฌ์ฆ ์ ํ. {"lsa", "lda", "hdp"} ์ค ํ๋.
- document_list (list) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- max_topic_count (int) : ์ต๋ ์ฃผ์ ๊ตฐ์ง ๊ฐ์.
-
Returns
- plotly graph (graph object) : ๊ฐ ์ฃผ์ ๊ตฐ์ง๋ณ Perplexity, Coherence ๊ฐ์ ํํํ ๋ผ์ธ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
tc.get_model_validation_graph("lda", tokenized_sentence_list, 10)
Output (in Jupyter Notebook) :
-
-
teanaps.text_analysis.sequence_lda_topic_modeling(document_list, time_slice, topic_count)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์์์ N๊ฐ์ ์ฃผ์ ์ ๋ํด ๊ฐ ๊ตฐ์ง์ ๊ธฐ๊ฐ๋ณ ๋ณํ ์ถ์ด๋ฅผ ๊ณ์ฐํ๊ณ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- document_list (list) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- time_slice (list) : ์ ์ฒด ๋ฌธ์๋ฅผ ๊ธฐ๊ฐ์ผ๋ก ๊ตฌ๋ถํ๋ ๋จ์
- topic_count (int) : ์ฃผ์ ๊ตฐ์ง ๊ฐ์.
-
Returns
- result (list) : ๊ฐ ๊ธฐ๊ฐ/๊ตฐ์ง๋ณ ํค์๋๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
result = tc.sequence_lda_topic_modeling(tokenized_sentence_list, [100, 100, ..., 100], 5) print(result)
Output (in Jupyter Notebook) :
[(0, [[('์ฃผ์', 0.021218330732594246), ('์ข ๋ชฉ', 0.018796542321031225), ('์์ฅ', 0.01679681367262934), ...], [('์ฃผ์', 0.02193776754354376), ('์ข ๋ชฉ', 0.01936867384889522), ('์์ฅ', 0.016617304727897478), ..., ], ..., ], ..., ]
-
-
teanaps.text_analysis.get_sequence_topic_graph()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์์์ N๊ฐ์ ์ฃผ์ ์ ๋ํด ๊ฐ ๊ตฐ์ง์ ๊ธฐ๊ฐ๋ณ ๋ณํ ์ถ์ด๋ฅผ ๊ณ์ฐํ๊ณ ์ด๋ฅผ ๊ทธ๋ํ๋ก ํํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- plotly graph (graph object) : ๊ตฐ์ง์ ๊ธฐ๊ฐ๋ณ ๋ณํ ์ถ์ด๋ฅผ ํํํ ๋ผ์ธ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
tc.get_sequence_topic_graph()
Output (in Jupyter Notebook) :
-
teanaps.text_analysis.CoWordCalculator
3.4. Python Code (in Jupyter Notebook) :
from teanaps.text_analysis import CoWordCalculator co = CoWordCalculator()
-
teanaps.text_analysis.CoWordCalculator.calculation_co_matrix(document_list, node_list=[])
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์์ ํฌํจ๋ ๋จ์ด์ ๋์์ถํ๋น๋๋ฅผ ๊ณ์ฐํฉ๋๋ค.
-
Parameters
- document_list (list) : ํํ์ ๋จ์๋ก ๋ถ๋ฆฌ๋ ๋จ์ด๋ก ํํ๋ ๋ฌธ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- node_list (list) : ๋์์ถํ๋น๋๋ฅผ ๊ณ์ฐํ ๋จ์ด ๋ฆฌ์คํธ.
-
Returns
- None
-
Examples
Python Code (in Jupyter Notebook) :
node_list = ["๊ธ๋ฆฌ", "๊ธ์ต", "๋์ถ", "๋นํธ์ฝ์ธ", "๋ถ๋์ฐ", "์ํ", "์ฝํฝ์ค", "์์ฐ", "์์ฅ", "์ ํ", "๊ทธ๋ฆผ์", "ํฌ์", "๊ฑฐ๋", "์ ๋ถ", "์ํ", "์ ์ฉ", "๋ฆฌ์คํฌ"] co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list)
-
-
teanaps.text_analysis.CoWordCalculator.get_edge_list()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์์ ํฌํจ๋ ๋จ์ด์ ๋์์ถํ ์์์๊ณผ ์ถํ๋น๋๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- result (list) : ((๋จ์ด, ๋จ์ด), ๋์์ถํ๋น๋) ๊ตฌ์กฐ์ Tuple์ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
#node_list = ["๊ธ๋ฆฌ", "๊ธ์ต", "๋์ถ", "๋นํธ์ฝ์ธ", "๋ถ๋์ฐ", "์ํ", "์ฝํฝ์ค", "์์ฐ", "์์ฅ", "์ ํ", "๊ทธ๋ฆผ์", "ํฌ์", "๊ฑฐ๋", "์ ๋ถ", "์ํ", "์ ์ฉ", "๋ฆฌ์คํฌ"] #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list) result = co.get_edge_list() print(result)
Output (in Jupyter Notebook) :
[(('๊ธ๋ฆฌ', '๊ธ๋ฆฌ'), 905), (('๊ธ์ต', '๊ธ์ต'), 791), (('๋์ถ', '๋์ถ'), 580), (('๋นํธ์ฝ์ธ', '๋นํธ์ฝ์ธ'), 565), (('๋ถ๋์ฐ', '๋ถ๋์ฐ'), 555), ..., (('๋์ถ', '์ ํ'), 1), (('๊ธ๋ฆฌ', '์์ฐ'), 1), (('์์ฐ', '๊ธ๋ฆฌ'), 1), (('์ ํ', 'ํฌ์'), 1), (('ํฌ์', '์ ํ'), 1) ]
-
-
teanaps.text_analysis.CoWordCalculator.get_node_list()
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋์์ถํ๋น๋ ๊ณ์ฐ์ ํฌํจ๋ ๋ชจ๋ ๋จ์ด๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- None
-
Returns
- result (list) : ๋์์ถํ๋น๋ ๊ณ์ฐ์ ํฌํจ๋ ๋ชจ๋ ๋จ์ด ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
#node_list = ["๊ธ๋ฆฌ", "๊ธ์ต", "๋์ถ", "๋นํธ์ฝ์ธ", "๋ถ๋์ฐ", "์ํ", "์ฝํฝ์ค", "์์ฐ", "์์ฅ", "์ ํ", "๊ทธ๋ฆผ์", "ํฌ์", "๊ฑฐ๋", "์ ๋ถ", "์ํ", "์ ์ฉ", "๋ฆฌ์คํฌ"] #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list) result = co.get_node_list() print(result)
Output (in Jupyter Notebook) :
['๊ธ๋ฆฌ', '๊ธ์ต', '๋์ถ', '๋นํธ์ฝ์ธ', '๋ถ๋์ฐ', '์ํ', '์ฝํฝ์ค', '์์ฐ', '์์ฅ', '์ ํ', '๊ทธ๋ฆผ์', 'ํฌ์', '๊ฑฐ๋', '์ ๋ถ', '์ํ', '์ ์ฉ', '๋ฆฌ์คํฌ']
-
-
teanaps.text_analysis.CoWordCalculator.get_co_word(word)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
ํน์ ๋จ์ด๋ฅผ ๊ธฐ์ค์ผ๋ก ๋ค๋ฅธ ๋จ์ด๋ค๊ณผ์ ๋์์ถํ๋น๋๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- word (str) : ๋์์ถํ๋น๋๋ฅผ ๊ณ์ฐํ ๋ ๊ธฐ์ค์ด ๋๋ ๋จ์ด.
-
Returns
- result (list) : (๋จ์ด, ๋์์ถํ๋น๋) ๊ตฌ์กฐ์ Tuple์ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
#node_list = ["๊ธ๋ฆฌ", "๊ธ์ต", "๋์ถ", "๋นํธ์ฝ์ธ", "๋ถ๋์ฐ", "์ํ", "์ฝํฝ์ค", "์์ฐ", "์์ฅ", "์ ํ", "๊ทธ๋ฆผ์", "ํฌ์", "๊ฑฐ๋", "์ ๋ถ", "์ํ", "์ ์ฉ", "๋ฆฌ์คํฌ"] #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list) result = co.get_co_word("๊ธ๋ฆฌ") print(result)
Output (in Jupyter Notebook) :
[('๋์ถ', 341), ('์ฝํฝ์ค', 105), ('์ํ', 82), ..., ('์ ๋ถ', 2), ('๋นํธ์ฝ์ธ', 1), ('์์ฐ', 1) ]
-
-
teanaps.text_analysis.CoWordCalculator.get_centrality(centrality_type)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋จ์ด์ ๋์์ถํ ์ ๋ณด๋ฅผ ๋ฐํ์ผ๋ก ๋คํธ์ํฌ ์ค์ฌ์ฑ์ ๊ณ์ฐํ๊ณ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- centrality_type (str) : ๋คํธ์ํฌ ์ค์ฌ์ฑ ์ ํ. {"d_cent", "b_cent", "c_cent"} ์ค ํ๋.
-
Returns
- result (dict) : ๋จ์ด์ ๋จ์ด์ ์ค์ฌ์ฑ์ ํฌํจํ๋ ๋์ ๋๋ฆฌ.
-
Examples
Python Code (in Jupyter Notebook) :
#node_list = ["๊ธ๋ฆฌ", "๊ธ์ต", "๋์ถ", "๋นํธ์ฝ์ธ", "๋ถ๋์ฐ", "์ํ", "์ฝํฝ์ค", "์์ฐ", "์์ฅ", "์ ํ", "๊ทธ๋ฆผ์", "ํฌ์", "๊ฑฐ๋", "์ ๋ถ", "์ํ", "์ ์ฉ", "๋ฆฌ์คํฌ"] #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list) result = co.get_centrality("d_cent") print(result)
Output (in Jupyter Notebook) :
{'๊ฑฐ๋': 0.625, '๊ทธ๋ฆผ์': 0.5625, '๊ธ๋ฆฌ': 0.9375, ..., '์ ๋ถ': 0.75, '์ฝํฝ์ค': 0.5625, 'ํฌ์': 0.625 }
-
-
teanaps.text_analysis.CoWordCalculator.get_co_matrix_graph(max_count)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋จ์ด์ ๋์์ถํ๋น๋ ์์ N๊ฐ๋ฅผ ๋งคํธ๋ฆญ์ค ๊ทธ๋ํ๋ก ํํํฉ๋๋ค.
-
Parameters
- max_count (int) : ๋งคํธ๋ฆญ์ค ๊ทธ๋ํ๋ก ํํํ ๋จ์ด์ ๊ฐ์.
-
Returns
- plotly graph (graph object) : ๋์์ถํ๋น๋๋ฅผ ํํํ ๋งคํธ๋ฆญ์ค ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
#node_list = ["๊ธ๋ฆฌ", "๊ธ์ต", "๋์ถ", "๋นํธ์ฝ์ธ", "๋ถ๋์ฐ", "์ํ", "์ฝํฝ์ค", "์์ฐ", "์์ฅ", "์ ํ", "๊ทธ๋ฆผ์", "ํฌ์", "๊ฑฐ๋", "์ ๋ถ", "์ํ", "์ ์ฉ", "๋ฆฌ์คํฌ"] #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list) co.get_co_matrix_graph(max_count)
Output (in Jupyter Notebook) :
-
-
teanaps.text_analysis.CoWordCalculator.get_word_network_graph(centrality_dict, mode="markers", centrality_th=0.5, ego_node_list=[], node_size_rate=10, edge_width_rate=10, text_size_rate=10)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋จ์ด์ ๋์์ถํ ๊ด๊ณ๋ฅผ ๋คํธ์ํฌ ๊ทธ๋ํ๋ก ํํํฉ๋๋ค.
-
Parameters
- centrality_dict (dict) : ๋จ์ด์ ๋จ์ด์ ์ค์ฌ์ฑ์ ํฌํจํ๋ ๋์
๋๋ฆฌ.
teanaps.text_analysis.CoWordCalculator.get_centrality
์ฐธ๊ณ . - mode (str) : ๊ทธ๋ํ ๋ ธ๋ ํํ๋ฐฉ์. {"markers", "text", "markers+text"} ์ค ํ๋ ์ ๋ ฅ.
- centrality_th (float) : ๋ ธ๋ ํํฐ๋ง ๊ธฐ์ค ์ค์ฌ์ฑ ์์น. ์ ๋ ฅํ ๊ฐ ์ด์์ ์ค์ฌ์ฑ์ ๊ฐ์ง ๋ ธ๋๋ง ๊ทธ๋ํ์ ํํ๋จ.
- ego_node_list (list) : ์๊ณ ๋คํธ์ํฌ๋ฅผ ์์ฑํ ์ค์ฌ๋ ธ๋ ๋ฆฌ์คํธ. ์ ๋ ฅ๋ ๋ ธ๋์ ์ง์ ์ฐ๊ฒฐ๋ ๋ ธ๋๋ง ๊ทธ๋ํ์ ํํ๋จ.
- node_size_rate (int) : ๋ ธ๋ ์ฌ์ด์ฆ ํํ ๊ฐ์ค์น. ์์น๊ฐ ๋์์๋ก ๋ ธ๋์ ํฌ๊ธฐ๊ฐ ํฌ๊ฒ ํํ๋จ.
- edge_width_rate (int) : ์ฃ์ง ๋๊ป ํํ ๊ฐ์ค์น. ์์น๊ฐ ๋์์๋ก ์ฃ์ง์ ๋๊ป๊ฐ ๊ฐ๋๊ฒ ํํ๋จ.
- text_size_rate (int) : ํ ์คํธ ๋ ์ด๋ธ ํฌ๊ธฐ ํํ ๊ฐ์ค์น. ์์น๊ฐ ๋์์๋ก ํ ์คํธ ๋ ์ด๋ธ ํฌ๊ธฐ๊ฐ ์๊ฒ ํํ๋จ.
- centrality_dict (dict) : ๋จ์ด์ ๋จ์ด์ ์ค์ฌ์ฑ์ ํฌํจํ๋ ๋์
๋๋ฆฌ.
-
Returns
- plotly graph (graph object) : ๋์์ถํ๋น๋๋ฅผ ํํํ ๋คํธ์ํฌ ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
#node_list = ["๊ธ๋ฆฌ", "๊ธ์ต", "๋์ถ", "๋นํธ์ฝ์ธ", "๋ถ๋์ฐ", "์ํ", "์ฝํฝ์ค", "์์ฐ", "์์ฅ", "์ ํ", "๊ทธ๋ฆผ์", "ํฌ์", "๊ฑฐ๋", "์ ๋ถ", "์ํ", "์ ์ฉ", "๋ฆฌ์คํฌ"] #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list) #centrality_dict = co.get_centrality("d_cent") co.get_word_network_graph(centrality_dict, mode="markers")
Output (in Jupyter Notebook) :
Python Code (in Jupyter Notebook) :
#node_list = ["๊ธ๋ฆฌ", "๊ธ์ต", "๋์ถ", "๋นํธ์ฝ์ธ", "๋ถ๋์ฐ", "์ํ", "์ฝํฝ์ค", "์์ฐ", "์์ฅ", "์ ํ", "๊ทธ๋ฆผ์", "ํฌ์", "๊ฑฐ๋", "์ ๋ถ", "์ํ", "์ ์ฉ", "๋ฆฌ์คํฌ"] #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list) #centrality_dict = co.get_centrality("d_cent") co.get_word_network_graph(centrality_dict, mode="text")
Output (in Jupyter Notebook) :
Python Code (in Jupyter Notebook) :
#node_list = ["๊ธ๋ฆฌ", "๊ธ์ต", "๋์ถ", "๋นํธ์ฝ์ธ", "๋ถ๋์ฐ", "์ํ", "์ฝํฝ์ค", "์์ฐ", "์์ฅ", "์ ํ", "๊ทธ๋ฆผ์", "ํฌ์", "๊ฑฐ๋", "์ ๋ถ", "์ํ", "์ ์ฉ", "๋ฆฌ์คํฌ"] #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list) #centrality_dict = co.get_centrality("d_cent") co.get_word_network_graph(centrality_dict, mode="markers+text")
Output (in Jupyter Notebook) :
-
teanaps.text_analysis.SentimentAnalysis
3.5. Python Code (in Jupyter Notebook) :
from teanaps.text_analysis import SentimentAnalysis senti = SentimentAnalysis(model_path="/model", kobert_path="/kobert")
Notes :
-
teanaps.text_analysis.SentimentAnalysis.tag(sentence, neutral_th=0.5)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์ฅ์ ๊ฐ์ฑ์์ค์ ๊ธ์ ๋๋ ๋ถ์ ์ผ๋ก ๋ถ๋ฅํ๊ณ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- sentence (str) : ํ๊ตญ์ด ๋๋ ์์ด๋ก ๊ตฌ์ฑ๋ ๋ฌธ์ฅ. ์ต๋ 128์.
- neutral_th (float) : ๊ธ์ ๋๋ ๋ถ์ ์ ๊ฐ๋ ์ฐจ์ด์์ ์ค๋ฆฝ์ผ๋ก ํ๋จํ๋ ๋ฒ์. 0~1.
-
Returns
- result (list) : ((๋ถ์ ๊ฐ๋, ๊ธ์ ๊ฐ๋), ๊ธ/๋ถ์ ๋ผ๋ฒจ) ๊ตฌ์กฐ์ Tuple์ ํฌํจํ๋ ๋ฆฌ์คํธ. ๊ธ์ /๋ถ์ ๊ฐ๋๋ 0~1. ๊ธ๋ถ์ ๋ผ๋ฒจ์ {"positive", "negative"} ์ค ํ๋.
-
Examples
Python Code (in Jupyter Notebook) :
sentence = "๋ ๋ฐฐ์ฐ๊ณ ๋ฐฐํธ๋ ์์ธ๊ฐ ํ์ํฉ๋๋ค." result = senti.tag(sentence, neutral_th=0.3) print(result)
Output (in Jupyter Notebook) :
((0.0595, 0.9543), 'positive')
Python Code (in Jupyter Notebook) :
sentence = "๊ณผํ ์์ฌ์ ์ฃผ๋ณ ์ฌ๋๋ค์๊ฒ ํผํด๋ฅผ ์ค๋๋ค." result = senti.tag(sentence, neutral_th=0.3) print(result)
Output (in Jupyter Notebook) :
((0.8715, 0.1076), 'negative')
-
-
teanaps.text_analysis.SentimentAnalysis.get_weight(sentence)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๊ฐ์ฑ์์ค ๋ถ๋ฅ์ ์ฐธ์กฐ๋ ๊ฐ ๊ฐ ํํ์๋ณ ๊ฐ์ค์น๋ฅผ ํ์ด๋ผ์ดํธํ ํํ์ ๋ฌธ์ฅ ๊ทธ๋ํ๋ก ์ถ๋ ฅํฉ๋๋ค.
-
Parameters
- sentence (str) : ํ๊ตญ์ด ๋๋ ์์ด๋ก ๊ตฌ์ฑ๋ ๋ฌธ์ฅ. ์ต๋ 128์.
-
Returns
- token_list (list) : ๋ฌธ์ฅ์ ๊ฐ ํํ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- weight_list (list) : ๋ฌธ์ฅ์ ๊ฐ ํํ์ ๋ณ ๊ฐ์ค์น๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
sentence = "๋ ๋ฐฐ์ฐ๊ณ ๋ฐฐํธ๋ ์์ธ๊ฐ ํ์ํฉ๋๋ค." token_list, weight_list = senti.get_weight(sentence) print(token_list) print(weight_list)
Output (in Jupyter Notebook) :
[' ๋', ' ๋ฐฐ์ฐ', '๊ณ ', ' ๋ฐฐ', 'ํธ', '๋', ' ์์ธ', '๊ฐ', ' ํ์', 'ํฉ๋๋ค', ' ', '.'] [0.072522074, 0.08697342, 0.052703843, 0.051040735, 0.0606895, 0.05134341, 0.05213573, 0.08644837, 0.078125894, 0.079360135, 0, 0.079488374]
Python Code (in Jupyter Notebook) :
sentence = "๊ณผํ ์์ฌ์ ์ฃผ๋ณ ์ฌ๋๋ค์๊ฒ ํผํด๋ฅผ ์ค๋๋ค." token_list, weight_list = senti.get_weight(sentence) print(token_list) print(weight_list)
Output (in Jupyter Notebook) :
[' ', '๊ณผ', 'ํ', ' ์์ฌ', '์', ' ์ฃผ๋ณ', ' ์ฌ๋๋ค', '์๊ฒ', ' ํผํด๋ฅผ', ' ', '์ค', '๋๋ค', ' ', '.'] [0, 0.020344315, 0.024879746, 0.02612342, 0.03615231, 0.048542265, 0.06707654, 0.0936653, 0.07649707, 0, 0.08189902, 0.08962273, 0, 0.07841993]
-
-
teanaps.text_analysis.SentimentAnalysis.draw_weight(sentence)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๊ฐ์ฑ์์ค ๋ถ๋ฅ์ ์ฐธ์กฐ๋ ๊ฐ ๊ฐ ํํ์๋ณ ๊ฐ์ค์น๋ฅผ ํ์คํ ๊ทธ๋จ์ผ๋ก ์ถ๋ ฅํฉ๋๋ค.
-
Parameters
- sentence (str) : ํ๊ตญ์ด ๋๋ ์์ด๋ก ๊ตฌ์ฑ๋ ๋ฌธ์ฅ. ์ต๋ 128์.
-
Returns
- plotly graph (graph object) : ๊ฐ์ฑ์์ค ๋ถ๋ฅ์ ์ฐธ์กฐ๋ ๊ฐ ๊ฐ ํํ์์ ๋ํ ๊ฐ์ค์น ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
sentence = "๋ ๋ฐฐ์ฐ๊ณ ๋ฐฐํธ๋ ์์ธ๊ฐ ํ์ํฉ๋๋ค." senti.draw_weight(sentence)
Output (in Jupyter Notebook) :
Python Code (in Jupyter Notebook) :
sentence = "๊ณผํ ์์ฌ์ ์ฃผ๋ณ ์ฌ๋๋ค์๊ฒ ํผํด๋ฅผ ์ค๋๋ค." senti.draw_weight(sentence)
Output (in Jupyter Notebook) :
-
-
teanaps.text_analysis.SentimentAnalysis.draw_sentence_weight(sentence)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๊ฐ์ฑ์์ค ๋ถ๋ฅ์ ์ฐธ์กฐ๋ ๊ฐ ๊ฐ ํํ์๋ณ ๊ฐ์ค์น๋ฅผ ํ์ด๋ผ์ดํธํ ํํ์ ๋ฌธ์ฅ ๊ทธ๋ํ๋ก ์ถ๋ ฅํฉ๋๋ค.
-
Parameters
- sentence (str) : ํ๊ตญ์ด ๋๋ ์์ด๋ก ๊ตฌ์ฑ๋ ๋ฌธ์ฅ. ์ต๋ 128์.
-
Returns
- plotly graph (graph object) : ๋ฌธ์ฅ ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
sentence = "๋ ๋ฐฐ์ฐ๊ณ ๋ฐฐํธ๋ ์์ธ๊ฐ ํ์ํฉ๋๋ค." senti.draw_sentence_weight(sentence)
Output (in Jupyter Notebook) :
Python Code (in Jupyter Notebook) :
sentence = "๊ณผํ ์์ฌ์ ์ฃผ๋ณ ์ฌ๋๋ค์๊ฒ ํผํด๋ฅผ ์ค๋๋ค." senti.draw_sentence_weight(sentence)
Output (in Jupyter Notebook) :
-
-
teanaps.text_analysis.SentimentAnalysis.get_sentiment_parse(sentence, neutral_th=0.3, , tagger="mecab", model_path="/model")
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์ฅ์ ๊ฐ ์ด์ ์ ๋ํ ๊ฐ์ฑ์์ค์ ๊ธ์ ๋๋ ๋ถ์ ์ผ๋ก ๋ถ๋ฅํ๊ณ ๊ทธ ๊ฐ์ค์น๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- sentence (str) : ํ๊ตญ์ด ๋๋ ์์ด๋ก ๊ตฌ์ฑ๋ ๋ฌธ์ฅ. ์ต๋ 128์.
- neutral_th (float) : ๊ธ์ ๋๋ ๋ถ์ ์ ๊ฐ๋ ์ฐจ์ด์์ ์ค๋ฆฝ์ผ๋ก ํ๋จํ๋ ๋ฒ์. 0~1.
- tagger (str) : ํํ์ ๋ถ์๊ธฐ {"okt", "mecab", "mecab-ko", "kkma"} ์ค ํ๋ ์
๋ ฅ.
teanaps.nlp.ma.set_tagger
์ฐธ๊ณ . - model_path (str) : ๊ฐ์ฒด๋ช
์ธ์ ๋ชจ๋ธ ํ์ผ ๊ฒฝ๋ก.
teanaps.nlp.ner.parse
์ฐธ๊ณ .
-
Returns
- phrase_token_weight_list (list) : ์ด์ ๊ณผ ๊ฐ ์ด์ ์ ๋ํ ๊ฐ์ฑ๋ถ์ ๊ฒฐ๊ณผ๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- token_list (list) : ๋ฌธ์ฅ์ ๊ฐ ํํ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
- weight_list (list) : ๋ฌธ์ฅ์ ๊ฐ ํํ์ ๋ณ ๊ฐ์ค์น๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
sentence = "์์ฌ์์ด์๊ฒ ์คํธ๋ ์ค ๋ฐ์ผ๋ฉฐ ์ด๋ค๊ฐ ๋ ๋๊ณ ๋๋ ๋๋ฌด ํ๋ณตํด์!" phrase_token_weight_list, token_list, weight_list = senti.get_sentiment_parse(sentence, neutral_th=0.5) print(phrase_token_weight_list) print(token_list) print(weight_list)
Output (in Jupyter Notebook) :
[(((0.5991, 0.3836), 'neutral'), '์์ฌ์์ด์๊ฒ', [('์์ฌ์์ด', 'NNG', 'UN', (0, 4))], [('์๊ฒ', 'JKB', 'UN', (4, 6))](/fingeredman/teanaps/wiki/('์์ฌ์์ด',-'NNG',-'UN',-(0,-4))],-[('์๊ฒ',-'JKB',-'UN',-(4,-6)))), (((0.9147, 0.0828), 'negative'), '์คํธ๋ ์ค ๋ฐ์ผ๋ฉฐ', [('์คํธ๋ ์ค', 'NNG', 'UN', (7, 11)), ('๋ฐ', 'VV', 'UN', (12, 13))], [('์ผ๋ฉฐ', 'EC', 'UN', (13, 15))](/fingeredman/teanaps/wiki/('์คํธ๋ ์ค',-'NNG',-'UN',-(7,-11)),-('๋ฐ',-'VV',-'UN',-(12,-13))],-[('์ผ๋ฉฐ',-'EC',-'UN',-(13,-15)))), (((0.9047, 0.0953), 'negative'), '์ด๋ค๊ฐ', [('์ด', 'VV', 'UN', (16, 17))], [('๋ค๊ฐ', 'EC', 'UN', (17, 19))](/fingeredman/teanaps/wiki/('์ด',-'VV',-'UN',-(16,-17))],-[('๋ค๊ฐ',-'EC',-'UN',-(17,-19)))), (((0.8306, 0.1751), 'negative'), '๋ ๋๊ณ ', [('๋ ๋', 'VV', 'UN', (20, 22))], [('๊ณ ', 'EC', 'UN', (22, 23))](/fingeredman/teanaps/wiki/('๋ ๋',-'VV',-'UN',-(20,-22))],-[('๊ณ ',-'EC',-'UN',-(22,-23)))), (((0.453, 0.5296), 'neutral'), '๋๋', [('๋', 'VX', 'UN', (23, 24))], [('๋', 'EC', 'UN', (24, 25))](/fingeredman/teanaps/wiki/('๋',-'VX',-'UN',-(23,-24))],-[('๋',-'EC',-'UN',-(24,-25)))), (((0.1065, 0.8982), 'positive'), '๋๋ฌด ํ๋ณตํด์!', [('๋๋ฌด', 'MAG', 'UN', (26, 28))], [('ํ๋ณต', 'NNG', 'UN', (29, 31))], [('ํด์', 'XSV+EF', 'UN', (31, 33)), ('!', 'SW', 'UN', (33, 34))](/fingeredman/teanaps/wiki/('๋๋ฌด',-'MAG',-'UN',-(26,-28))],-[('ํ๋ณต',-'NNG',-'UN',-(29,-31))],-[('ํด์',-'XSV+EF',-'UN',-(31,-33)),-('!',-'SW',-'UN',-(33,-34))))] [' ์์ฌ', '์', '์ด', '์๊ฒ', ' ์คํธ๋ ์ค', ' ๋ฐ์ผ๋ฉฐ', ' ์ด', '๋ค', '๊ฐ', ' ๋ ๋', '๊ณ ', ' ๋', '๋', ' ๋๋ฌด', ' ํ๋ณต', 'ํด', '์', ' ', '!'] [0, 0, 0, 0, -0.2424436, -0.20117857, -0.16506892, -0.16892226, -0.27025366, -0.16876356, -0.33119142, 0, 0, 0.15942541, 0.13346915, 0.11855107, 0.15605149, 0, 0.11754697]
-
-
teanaps.text_analysis.SentimentAnalysis.draw_sentiment_parse(token_list, weight_list)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์ฅ์ ๊ฐ ์ด์ ์ ๋ํ ๊ฐ์ฑ๋ถ์ ๊ฒฐ๊ณผ๋ฅผ ํ์ด๋ผ์ดํธํ ํํ์ ๋ฌธ์ฅ ๊ทธ๋ํ๋ก ์ถ๋ ฅํฉ๋๋ค.
-
Parameters
- token_list (list) : ๋ฌธ์ฅ์ ๊ฐ ํํ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
teanaps.text_analysis.SentimentAnalysis.get_sentiment_parse
์ฐธ๊ณ . - weight_list (list) : ๋ฌธ์ฅ์ ๊ฐ ํํ์ ๋ณ ๊ฐ์ค์น๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
teanaps.text_analysis.SentimentAnalysis.get_sentiment_parse
์ฐธ๊ณ .
- token_list (list) : ๋ฌธ์ฅ์ ๊ฐ ํํ์๋ฅผ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Returns
- plotly graph (graph object) : ๋ฌธ์ฅ ๊ทธ๋ํ.
-
Examples
Python Code (in Jupyter Notebook) :
#sentence = "์์ฌ์์ด์๊ฒ ์คํธ๋ ์ค ๋ฐ์ผ๋ฉฐ ์ด๋ค๊ฐ ๋ ๋๊ณ ๋๋ ๋๋ฌด ํ๋ณตํด์!" #token_list = [' ์์ฌ', '์', '์ด', '์๊ฒ', ' ์คํธ๋ ์ค', ' ๋ฐ์ผ๋ฉฐ', ' ์ด', '๋ค', '๊ฐ', ' ๋ ๋', '๊ณ ', ' ๋', '๋', ' ๋๋ฌด', ' ํ๋ณต', 'ํด', '์', ' ', '!'] #weight_list = [0, 0, 0, 0, -0.2424436, -0.20117857, -0.16506892, -0.16892226, -0.27025366, -0.16876356, -0.33119142, 0, 0, 0.15942541, 0.13346915, 0.11855107, 0.15605149, 0, 0.11754697] senti.draw_sentiment_parse(token_list, weight_list)
Output (in Jupyter Notebook) :
-
teanaps.text_analysis.DocumentSummarizer
3.6. Python Code (in Jupyter Notebook) :
from teanaps.text_analysis import DocumentSummarizer ds = DocumentSummarizer()
-
teanaps.text_analysis.DocumentSummarizer.set_document(document_path)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
์์ฝํ ๋ฌธ์๋ฅผ ๋ถ๋ฌ์ต๋๋ค.
-
Parameters
- document_path (str) : ์์ฝํ ๋ฌธ์๊ฐ ์ ์ฅ๋ ํ ์คํธ ํ์ผ(.txt) ๊ฒฝ๋ก.
-
Returns
- None
-
Examples
Python Code (in Jupyter Notebook) :
document_path = "article.txt" ds.set_document(document_path)
-
-
teanaps.text_analysis.DocumentSummarizer.summarize(type, max_sentence)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๊ฐ์ฑ์์ค ๋ถ๋ฅ์ ์ฐธ์กฐ๋ ๊ฐ ๊ฐ ํํ์๋ณ ๊ฐ์ค์น๋ฅผ ํ์ด๋ผ์ดํธํ ํํ์ ๋ฌธ์ฅ ๊ทธ๋ํ๋ก ์ถ๋ ฅํฉ๋๋ค.
-
Parameters
- type (str) : ํ ์คํธ ์์ฝ ์๊ณ ๋ฆฌ์ฆ ์ ํ. {"textrank", "lsa"} ์ค ํ๋.
- max_sentence (int) : ์์ฝ์ ํตํด ์ถ์ถํ ๋ฌธ์ฅ์ ๊ฐ์.
-
Returns
- sentence_list (list) : ์์ฝ ์ถ์ถ๋ ๋ฌธ์ฅ์ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Examples
Text File (in "article.txt") :
- โ์์ธ์ด์ ๋โ ์ํฅ๋ฏผ(28, ํ ํธ๋ ํ์คํผ)์ด ํ ์น๋ฆฌ์ ๊ฒฐ์น๊ณจ์ ๋ฃ์์ผ๋ ๋์ ํ์ ์ ๋ฐ์ง ๋ชปํ๋ค. ์ ์ฒด์ ์ผ๋ก ๊ฒฝ๊ธฐ๋ ฅ์ด ์ข์ง ์์๋ค. ํ ํธ๋์ 23์ผ(์ดํ ํ๊ตญ์๊ฐ) ์ค์ ์๊ตญ ๋ฐ๋์ ์์นํ ํ ํธ๋ ํ์คํผ ์คํ๋์์์ ์ด๋ฆฐ ๋ ธ๋ฆฌ์น์ํฐ์์ ์๊ธ๋ฆฌ์ ํ๋ฆฌ๋ฏธ์ด๋ฆฌ๊ทธ 24๋ผ์ด๋ ํ๊ฒฝ๊ธฐ์์ 2-1๋ก ์น๋ฆฌํ๋ค. ์ด๋ ํ ํธ๋์ ์ ๋ฐ 38๋ถ ๋ธ๋ฆฌ ์๋ฆฌ๊ฐ ์ ์ ๊ณจ์ ๋ฃ์ ๋ค ํ๋ฐ 25๋ถ ํ ๋ฌด ํธํค์๊ฒ ํ๋ํฐํฅ ๊ณจ์ ํ์ฉํด ๋์ ์ ๋ด์คฌ๋ค. ์ดํ ํ ํธ๋์ ํ๋ฐ 34๋ถ ์ํฅ๋ฏผ์ด ๊ท ํ์ ๊นจ๋ ํค๋๊ณจ์ ํฐ๋จ๋ ธ๊ณ , ๊ฒฐ๊ตญ 2-1 ์น๋ฆฌ๋ฅผ ๊ฑฐ๋๋ค. ํ๋ฆฌ๋ฏธ์ด๋ฆฌ๊ทธ 8์์์ 6์๋ก ์ฌ๋ผ์ฐ๋ค. ํ์ง๋ง ๊ฒฐ์น๊ณจ์ ์ฃผ์ธ๊ณต ์ํฅ๋ฏผ์ ๋์ ํ์ ์ ๋ฐ์ง ๋ชปํ๋ค. ์๊ตญ ์ถ๊ตฌ ํต๊ณ ์ ๋ฌธ์ฌ์ดํธ ํ์ค์ฝ์ด๋๋ท์ปด์ ์ํฅ๋ฏผ์๊ฒ ๋น๊ต์ ๋ฎ์ ํ์ ์ธ 6.8์ ์ ๋ถ์ฌํ๋ค. ์ํฅ๋ฏผ์ ์๋ฆฌ์ ์ ์ ๊ณจ์ ๊ธฐ์ ์ญํ ์ ํ๊ณ , ๊ฒฐ์น๊ณจ์ ๋ฃ์์ผ๋ ๋ค๋ฅธ ์ฅ๋ฉด์์๋ ์ด๋ ๋ค ํ ๋ชจ์ต์ ๋ณด์ด์ง ๋ชปํ๋ค. ํ ํธ๋์์๋ ์ค๋ฆฌ์๊ฐ 8์ ์ผ๋ก ๊ฐ์ฅ ๋์๊ณ , ๋ก ์ ์๊ฐ 7.9์ ๊ทธ๋ฆฌ๊ณ ์๋ฆฌ๊ฐ 7.6์ ์ผ๋ก ๋ค๋ฅผ ์ด์๋ค. [๋์๋ท์ปด, ์กฐ์ฑ์ด ๊ธฐ์, 2020.1.23., ๋ณธ๋ฌธ๋ณด๊ธฐ]
Python Code (in Jupyter Notebook) :
#document_path = "article.txt" #ds.set_document(document_path) result = ds.summarize("textrank", 3) print(result)
Output (in Jupyter Notebook) :
['โ์์ธ์ด์ ๋โ ์ํฅ๋ฏผ(28, ํ ํธ๋ ํ์คํผ)์ด ํ ์น๋ฆฌ์ ๊ฒฐ์น๊ณจ์ ๋ฃ์์ผ๋ ๋์ ํ์ ์ ๋ฐ์ง ๋ชปํ๋ค.', 'ํ ํธ๋์ 23์ผ(์ดํ ํ๊ตญ์๊ฐ) ์ค์ ์๊ตญ ๋ฐ๋์ ์์นํ ํ ํธ๋ ํ์คํผ ์คํ๋์์์ ์ด๋ฆฐ ๋ ธ๋ฆฌ์น์ํฐ์์ ์๊ธ๋ฆฌ์ ํ๋ฆฌ๋ฏธ์ด๋ฆฌ๊ทธ 24๋ผ์ด๋ ํ๊ฒฝ๊ธฐ์์ 2-1๋ก ์น๋ฆฌํ๋ค.', '์ด๋ ํ ํธ๋์ ์ ๋ฐ 38๋ถ ๋ธ๋ฆฌ ์๋ฆฌ๊ฐ ์ ์ ๊ณจ์ ๋ฃ์ ๋ค ํ๋ฐ 25๋ถ ํ ๋ฌด ํธํค์๊ฒ ํ๋ํฐํฅ ๊ณจ์ ํ์ฉํด ๋์ ์ ๋ด์คฌ๋ค.' ]
Python Code (in Jupyter Notebook) :
#document_path = "article.txt" #ds.set_document(document_path) result = ds.summarize("lsa", 3) print(result)
Output (in Jupyter Notebook) :
['ํ ํธ๋์ 23์ผ(์ดํ ํ๊ตญ์๊ฐ) ์ค์ ์๊ตญ ๋ฐ๋์ ์์นํ ํ ํธ๋ ํ์คํผ ์คํ๋์์์ ์ด๋ฆฐ ๋ ธ๋ฆฌ์น์ํฐ์์ ์๊ธ๋ฆฌ์ ํ๋ฆฌ๋ฏธ์ด๋ฆฌ๊ทธ 24๋ผ์ด๋ ํ๊ฒฝ๊ธฐ์์ 2-1๋ก ์น๋ฆฌํ๋ค.', '์ด๋ ํ ํธ๋์ ์ ๋ฐ 38๋ถ ๋ธ๋ฆฌ ์๋ฆฌ๊ฐ ์ ์ ๊ณจ์ ๋ฃ์ ๋ค ํ๋ฐ 25๋ถ ํ ๋ฌด ํธํค์๊ฒ ํ๋ํฐํฅ ๊ณจ์ ํ์ฉํด ๋์ ์ ๋ด์คฌ๋ค.', '์ํฅ๋ฏผ์ ์๋ฆฌ์ ์ ์ ๊ณจ์ ๊ธฐ์ ์ญํ ์ ํ๊ณ , ๊ฒฐ์น๊ณจ์ ๋ฃ์์ผ๋ ๋ค๋ฅธ ์ฅ๋ฉด์์๋ ์ด๋ ๋ค ํ ๋ชจ์ต์ ๋ณด์ด์ง ๋ชปํ๋ค.' ]
-
teanaps.text_analysis.KeywordExtractor
3.7. Python Code (in Jupyter Notebook) :
from teanaps.text_analysis import KeywordExtractor ke = KeywordExtractor(model_path="/model")
Notes :
- ๋ชจ๋ธ ํ์ผ์ ๋ณ๋๋ก ๋ค์ด๋ก๋ํ์ฌ ํ์ผ ๊ฒฝ๋ก๋ฅผ
model_path
๋ณ์์ ํฌํจํด์ผํฉ๋๋ค.- import์ ์ต์ด 1ํ ๊ฒฝ๊ณ ๋ฉ์์ง (Warnning)๊ฐ ์ถ๋ ฅ๋ ์ ์์ต๋๋ค. ๋ฌด์ํ์ ๋ ์ข์ต๋๋ค.
teanaps.text_analysis.KeywordExtractor.parse(sentence, max_keyword=5)
Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)-
๋ฌธ์ฅ์์ ํต์ฌ ํค์๋๋ฅผ ๊ตฌ๋ถํ๊ณ ๊ทธ ๊ฐ์ค์น๋ฅผ ๋ฐํํฉ๋๋ค.
-
Parameters
- sentence (str) : ํ๊ตญ์ด ๋๋ ์์ด๋ก ๊ตฌ์ฑ๋ ๋ฌธ์ฅ. ์ต๋ 128์.
- max_keyword (int) : ์ถ์ถํ ์ต๋ ํค์๋ ๊ฐ์.
-
Returns
- result (list) : (ํค์๋, ๊ฐ์ค์น, ํค์๋ ์์น) ๊ตฌ์กฐ์ Tuple์ ํฌํจํ๋ ๋ฆฌ์คํธ.
-
Examples
Python Code (in Jupyter Notebook) :
sentence = "์ ํ๋ฌ์ค๋ ํต์ 3์ฌ(SKT, LGU+, KT) ์ค์ 5G ์๊ธ์ ๋ฅผ ์ต์ด๋ก ์ ๋ณด์์ต๋๋ค." result = ke.parse(sentence) print(result)
Output (in Jupyter Notebook) :
[('LGU+', 1.33617, (16, 20)), ('SKT', 0.81265, (11, 14)), ('KT', 0.79936, (22, 24)), ('5G', 0.74944, (29, 31)), ('์ ํ๋ฌ์ค', 0.37639, (0, 4))]
-