TEXT ANALYSIS - fingeredman/teanaps GitHub Wiki

TEANAPS API Documentation

Text Anlaysis

3. teanaps.text_analysis

3.1. teanaps.text_analysis.TfidfCalculator

Python Code (in Jupyter Notebook) :

from teanaps.text_analysis import TfidfCalculator

tfidf = TfidfCalculator()
  • teanaps.text_analysis.TfidfCalculator.calculation_tfidf(document_list) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์„œ์—์„œ ๋‹จ์–ด์˜ TF-IDF ๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • document_list (list) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
    • Returns

      • None
    • Examples

      Python Code (in Jupyter Notebook) :

      #tokenized_sentence_list = ['๋น„ํŠธ์ฝ”์ธ ๊ฐ€๋Šฅ์„ฑ ๊ฒฐํ•จ ์ „์ œ ๊ทœ์ œ ์ •๋น„',
      #                           '์›Œ๋ Œ ๋ฒ„ํ• ๋น„ํŠธ์ฝ”์ธ ๋ง์ƒ ํ•˜๋ฒ„๋“œ๋Œ€ ๊ต์ˆ˜ ๋ง์ƒ',
      #                           '๊ฐ€์ƒ ํ™”ํ ์ธํ„ฐ๋„ท ์ˆ˜์ค€ ๋น„ํŠธ์ฝ”์ธ ์บ์‹œ ์„ ๋‘ ์ฃผ์ž ๊ฒƒ',
      #                           ...,
      #                           '์ž์‚ฐ ํˆฌ๋ถ€ ํ†ต์‚ฐ ๋Œ€์‹  ์ž์‚ฐ ๋ถ€๋™์‚ฐ ์‹ ํƒ ์—… ์˜ˆ๋น„ ์Šน์ธ',
      #                           'ํ•œ๊ตญ ๋ถ€์ž ๋ถ€๋™์‚ฐ ์นจ์ฒด ๋ถ€๋™์‚ฐ ์•ˆ',
      #                           '๊ธˆํˆฌํ˜‘ ๋ถ€๋™์‚ฐ ํˆฌ์ž์ž ์ธ๋ ฅ ๊ณผ์ • ๊ธฐ ๊ฐœ์„ค']
      #                          ]
      tfidf.calculation_tfidf(tokenized_sentence_list)
      
  • teanaps.text_analysis.TfidfCalculator.get_tf_matrix() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์„œ๋ณ„ ๋‹จ์–ด์˜ TF (Term Frequency) ๊ฐ’์ด ์ €์žฅ๋œ DataFrame์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • Pandas DataFrame (dataframe) : ๋ฌธ์„œ๋ณ„ ๋‹จ์–ด์˜ TF ๊ฐ’์ด ์ €์žฅ๋œ DataFrame.
    • Examples

      Python Code (in Jupyter Notebook) :

      #tfidf.calculation_tfidf(tokenized_sentence_list)
      result = tfidf.get_tf_matrix()
      print(type(result))
      

      Output (in Jupyter Notebook) :

      pandas.core.frame.DataFrame
      
  • teanaps.text_analysis.TfidfCalculator.get_tf_vector(sentence) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋ฌธ์žฅ์„ TF (Term Frequency) ๊ฐ’์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ฒกํ„ฐ ๋ฆฌ์ŠคํŠธ๋กœ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • sentence (str) : ํ•œ๊ตญ์–ด ๋˜๋Š” ์˜์–ด๋กœ ๊ตฌ์„ฑ๋œ ๋ฌธ์žฅ. ์ตœ๋Œ€ 128์ž.
    • Returns

      • result (list) : TF (Term Frequency) ๊ฐ’์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ฒกํ„ฐ ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #tfidf.calculation_tfidf(tokenized_sentence_list)
      tokenized_sentence = "๋น„ํŠธ์ฝ”์ธ ๊ฐ€๋Šฅ์„ฑ ๊ฒฐํ•จ ์ „์ œ ๊ทœ์ œ ์ •๋น„"
      result = tfidf.get_tf_vector(sentence)
      print(result)
      

      Output (in Jupyter Notebook) :

      [0, 0, 1, 0, 0, 0, 0, 0, 0, ...]
      
  • teanaps.text_analysis.TfidfCalculator.get_tfidf_matrix() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์„œ๋ณ„ ๋‹จ์–ด์˜ TF-IDF ๊ฐ’์ด ์ €์žฅ๋œ DataFrame์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • Pandas DataFrame (dataframe) : ๋ฌธ์„œ๋ณ„ ๋‹จ์–ด์˜ TF-IDF ๊ฐ’์ด ์ €์žฅ๋œ DataFrame.
    • Examples

      Python Code (in Jupyter Notebook) :

      #tfidf.calculation_tfidf(tokenized_sentence_list)
      result = tfidf.get_tfidf_matrix()
      print(type(result))
      

      Output (in Jupyter Notebook) :

      pandas.core.frame.DataFrame
      
  • teanaps.text_analysis.TfidfCalculator.get_tfidf_vector(sentence) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋ฌธ์žฅ์„ TF-IDF ๊ฐ’์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ฒกํ„ฐ ๋ฆฌ์ŠคํŠธ๋กœ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • sentence (str) : ํ•œ๊ตญ์–ด ๋˜๋Š” ์˜์–ด๋กœ ๊ตฌ์„ฑ๋œ ๋ฌธ์žฅ. ์ตœ๋Œ€ 128์ž.
    • Returns

      • result (list) : TF-IDF ๊ฐ’์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ฒกํ„ฐ ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #tfidf.calculation_tfidf(tokenized_sentence_list)
      tokenized_sentence = "๋น„ํŠธ์ฝ”์ธ ๊ฐ€๋Šฅ์„ฑ ๊ฒฐํ•จ ์ „์ œ ๊ทœ์ œ ์ •๋น„"
      result = tfidf.get_tfidf_vector(sentence)
      print(result)
      

      Output (in Jupyter Notebook) :

      [0., 0., 0.45665731, 0., 0., ...]
      
  • teanaps.text_analysis.TfidfCalculator.get_tf_dict() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์ „์ฒด ๋ฌธ์„œ์—์„œ ๋‹จ์–ด์˜ TF ๊ฐ’์ด ์ €์žฅ๋œ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • result (dict) : ๋‹จ์–ด๋ณ„ TF ๊ฐ’์ด ์ €์žฅ๋œ ๋”•์…”๋„ˆ๋ฆฌ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #tfidf.calculation_tfidf(tokenized_sentence_list)
      result = tfidf.get_tf_dict()
      print(result)
      

      Output (in Jupyter Notebook) :

      {'๊ฐ€๊ฒฉ': 3,
       '๊ฐ€๋Šฅ': 1,
       '๊ฐ€๋Šฅ์„ฑ': 1,
       ...,
       'ํšจ๊ณผ': 2,
       'ํ๋ฆ„': 1,
       'ํก์ˆ˜': 1
      }
      
  • teanaps.text_analysis.TfidfCalculator.get_tf_list() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์ „์ฒด ๋ฌธ์„œ์—์„œ ๋‹จ์–ด์˜ TF ๊ฐ’์ด ์ €์žฅ๋œ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • result (list) : ๋‹จ์–ด๋ณ„ TF ๊ฐ’์ด ์ €์žฅ๋œ ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #tfidf.calculation_tfidf(tokenized_sentence_list)
      result = tfidf.get_tf_list()
      print(result)
      

      Output (in Jupyter Notebook) :

      [['๊ธˆ๋ฆฌ', 40],
       ['๋ถ€๋™์‚ฐ', 34],
       ['๊ธˆ์œต', 34],
       ...,
       ['์ถ”์ƒํ™”', 1],
       ['์‹œ์Šคํ…œ', 1],
       ['ํก์ˆ˜', 1]
      ]
      
  • teanaps.text_analysis.TfidfCalculator.get_tfidf_dict() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์ „์ฒด ๋ฌธ์„œ์—์„œ ๋‹จ์–ด์˜ TF-IDF ๊ฐ’์ด ์ €์žฅ๋œ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • result (dict) : ๋‹จ์–ด๋ณ„ TF-IDF ๊ฐ’์ด ์ €์žฅ๋œ ๋”•์…”๋„ˆ๋ฆฌ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #tfidf.calculation_tfidf(tokenized_sentence_list)
      result = tfidf.get_tfidf_dict()
      print(result)
      

      Output (in Jupyter Notebook) :

      {'๊ฐ€๊ฒฉ': 1.1424359882788366,
       '๊ฐ€๋Šฅ': 0.509179564909753,
       '๊ฐ€๋Šฅ์„ฑ': 0.45665731260262726,
       ...,
       'ํšจ๊ณผ': 1.0165526804723384,
       'ํ๋ฆ„': 0.473637588408657,
       'ํก์ˆ˜': 0.5177879851919405}
      }
      
  • teanaps.text_analysis.TfidfCalculator.get_tfidf_list() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์ „์ฒด ๋ฌธ์„œ์—์„œ ๋‹จ์–ด์˜ TF-IDF ๊ฐ’์ด ์ €์žฅ๋œ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • result (list) : ๋‹จ์–ด๋ณ„ TF-IDF ๊ฐ’์ด ์ €์žฅ๋œ ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #tfidf.calculation_tfidf(tokenized_sentence_list)
      result = tfidf.get_tfidf_list()
      print(result)
      

      Output (in Jupyter Notebook) :

      [['๊ธˆ๋ฆฌ', 9.231975802297294],
       ['๊ธˆ์œต', 7.963616858955622],
       ['๋ถ€๋™์‚ฐ', 7.727053435662074],
       ...,
       ['๋ฐ์ดํ„ฐ', 0.3291698807669874],
       ['๊ฑฐ๋ž˜์†Œ', 0.3291698807669874],
       ['ํˆฌ๋ช…', 0.3291698807669874]
      ]
      
  • teanaps.text_analysis.TfidfCalculator.get_word_list() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์ „์ฒด ๋ฌธ์„œ์— ํฌํ•จ๋œ ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • result (list) : ์ „์ฒด ๋ฌธ์„œ์— ํฌํ•จ๋œ ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ๊ฐ€ ์ €์žฅ๋œ ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #tfidf.calculation_tfidf(tokenized_sentence_list)
      result = tfidf.get_word_list()
      print(result)
      

      Output (in Jupyter Notebook) :

      ['๊ธˆ๋ฆฌ',
       '๊ธˆ์œต',
       '๋ถ€๋™์‚ฐ',
       ...,
       '๋ฐ์ดํ„ฐ',
       '๊ฑฐ๋ž˜์†Œ',
       'ํˆฌ๋ช…'
      ]
      
  • teanaps.text_analysis.TfidfCalculator.draw_tfidf(max_words=100) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์ „์ฒด ๋ฌธ์„œ์—์„œ ๋‹จ์–ด์˜ TF, TF-IDF ๊ฐ’์„ ํ‘œํ˜„ํ•œ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • max_words (int) : TF, TF-IDF ๊ฐ’์„ ํ‘œํ˜„ํ•  ๋‹จ์–ด ๊ฐœ์ˆ˜ (TF-IDF ์ƒ์œ„ ๊ธฐ์ค€).
    • Returns

      • plotly graph (graph object) : TF, TF-IDF ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      #tfidf.calculation_tfidf(tokenized_sentence_list)
      tfidf.draw_tfidf(100)
      

      Output (in Jupyter Notebook) : tfidf_histogram

  • teanaps.text_analysis.TfidfCalculator.get_wordcloud(weight_dict, mask_path=None) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋‹จ์–ด์˜ TF, TF-IDF ๊ฐ’์„ ํ‘œํ˜„ํ•œ ์›Œ๋“œํด๋ผ์šฐ๋“œ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • weight_dict (dict) : TF, TF-IDF ๊ฐ’์„ ์ €์žฅํ•œ ๋”•์…”๋„ˆ๋ฆฌ. teanaps.text_analysis.TfidfCalculator.get_tf_dict(), teanaps.text_analysis.TfidfCalculator.get_tfidf_dict() ์ฐธ๊ณ .
      • mask_path (str) : ์›Œ๋“œํด๋ผ์šฐ๋“œ ๋ชจ์–‘/์ƒ์ƒ์„ ๋ณ€๊ฒฝํ•  ์ƒ˜ํ”Œ ๋ฐฐ๊ฒฝ์ด๋ฏธ์ง€ ํŒŒ์ผ ๊ฒฝ๋กœ. ์ด๋ฏธ์ง€ ํŒŒ์ผ์€ PNG(*.png) ๋˜๋Š” JPEG(*.jpeg) ํ˜•์‹ ์ง€์›.
    • Returns

      • figure (matplotlib.pyplot.plt) : ์›Œ๋“œํด๋ฆฌ์šฐ๋“œ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #tfidf.calculation_tfidf(tokenized_sentence_list)
      result = tfidf.get_tf_dict()
      #result = tfidf.get_tfidf_dict()
      tfidf.get_wordcloud(result)
      

      Output (in Jupyter Notebook) : tfidf_wordcloud

3.2. teanaps.text_analysis.DocumentClustering

Python Code (in Jupyter Notebook) :

from teanaps.text_analysis import DocumentClustering

dc = DocumentClustering()
  • teanaps.text_analysis.DocumentClustering.clustering(alg, document_list, num_cluters=3, max_iterations=300, eps=0.5, min_samples=5) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์„œ๋ฅผ ๊ตฐ์ง‘ํ™”ํ•˜์—ฌ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • alg (str) : ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜. {"kmeans", "dbscan", "hdbscan"} ์ค‘ ํ•˜๋‚˜.
      • document_list (list) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • num_cluters (int) : ์ƒ์„ฑํ•  ๊ตฐ์ง‘์˜ ๊ฐœ์ˆ˜.
      • max_iterations (int) : ๊ตฐ์ง‘ํ™”๋ฅผ ๋ฐ˜๋ณตํ•ด์„œ ์ˆ˜ํ–‰ํ•  ํšŸ์ˆ˜.
      • eps (float) : DBSCAN ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ.
      • min_samples (int) : ํด๋Ÿฌ์Šคํ„ฐ์— ํฌํ•จํ•  ์ตœ์†Œ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜.
    • Returns

      • result (dict) : ๊ตฐ์ง‘์˜ Inertia ๊ฐ’๊ณผ ๋ฌธ์„œ๋ณ„ ๊ตฐ์ง‘ ๋ ˆ์ด๋ธ”์„ ํฌํ•จํ•˜๋Š” ๋”•์…”๋„ˆ๋ฆฌ.
    • Examples

      Python Code (in Jupyter Notebook) :

      result = dc.clustering("kmeans", tokenized_sentence_list, num_cluters=3, max_iterations=300)
      print(result)
      

      Output (in Jupyter Notebook) :

      {'inertia': 64.11752014008104,
       'predict_list': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)}
      

      Python Code (in Jupyter Notebook) :

      result = dc.clustering("dbscan", tokenized_sentence_list, eps=0.5, min_samples=5)
      print(result)
      

      Output (in Jupyter Notebook) :

      {'inertia': 64.11752014008104,
       'predict_list': array([-1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, -1, 1, 1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -1, -1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)}
      

      Python Code (in Jupyter Notebook) :

      result = dc.clustering("hdbscan", tokenized_sentence_list, min_samples=5)
      print(result)
      

      Output (in Jupyter Notebook) :

      {'inertia': 64.11752014008104,
       'predict_list': array([-1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, -1, 1, 1, 1, 1, 1, -1, -1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -1, -1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)}
      
  • teanaps.text_analysis.DocumentClustering.kmeans_inertia_transition(document_list, max_cluters, max_iterations) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • K-MEANS ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ตฐ์ง‘ ๊ฐœ์ˆ˜๋ฅผ ๋ณ€๊ฒฝํ•˜๋ฉฐ ๋ฌธ์„œ๋ฅผ ๊ตฐ์ง‘ํ™”ํ•˜๊ณ  ๊ฐ๊ฐ์˜ Inertial ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • document_list (list) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • max_cluters (int) : Intertia ๊ฐ’์„ ๊ณ„์‚ฐํ•  ์ตœ๋Œ€ ๊ตฐ์ง‘์˜ ๊ฐœ์ˆ˜.
      • max_iterations (int) : ๊ตฐ์ง‘ํ™”๋ฅผ ๋ฐ˜๋ณตํ•ด์„œ ์ˆ˜ํ–‰ํ•  ํšŸ์ˆ˜.
    • Returns

      • result (dict) : ๊ตฐ์ง‘์˜ Inertia ๊ฐ’๊ณผ ๋ฌธ์„œ๋ณ„ ๊ตฐ์ง‘ ๋ ˆ์ด๋ธ”์„ ํฌํ•จํ•˜๋Š” ๋”•์…”๋„ˆ๋ฆฌ.
    • Examples

      Python Code (in Jupyter Notebook) :

      result = dc.kmeans_inertia_transition(tokenized_sentence_list, 10, 300)
      print(result)
      

      Output (in Jupyter Notebook) :

      [85.29314909321171, 73.22892942068657, 64.11752014008104, 60.672117244161946, 57.24561408281322, 55.125181445741525, 53.74440369290694, 52.262356865901175, 50.26148838373041, 48.480517037436805]
      
  • teanaps.text_analysis.DocumentClustering.get_kmeans_inertia_transition_graph(inertia_list) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๊ฐ๊ฐ์˜ Inertial ๊ฐ’์„ ํ‘œํ˜„ํ•œ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • inertia_list (list) : Inertial ๊ฐ’์ด ํฌํ•จ๋œ ๋ฆฌ์ŠคํŠธ.
    • Returns

      • plotly graph (graph object) : ๊ฐ๊ฐ์˜ ๊ตฐ์ง‘์˜ ๊ฐœ์ˆ˜๋ณ„ Inertial ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      inertia_list = dc.kmeans_inertia_transition(tokenized_sentence_list, 10, 300)
      dc.get_kmeans_inertia_transition_graph(inertia_list)
      

      Output (in Jupyter Notebook) : clustering_inertia_line_graph

  • teanaps.text_analysis.DocumentClustering.get_tfidf_tsne(document_list, predict_list, df_article) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๊ฐ ๋ฌธ์„œ์˜ ๋ ˆ์ด๋ธ”๊ณผ ๊ตฐ์ง‘, ๊ทธ๋ฆฌ๊ณ  ๋ฌธ์„œ๋ฅผ TF-IDF ์ž„๋ฒ ๋”ฉํ•˜์—ฌ ์ฐจ์›์ถ•์†Œํ•œ 2์ฐจ์› ์ขŒํ‘œ๋ฅผ ํฌํ•จํ•˜๋Š” DataFrame์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • document_list (str) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • predict_list (str) : ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ ๋ ˆ์ด๋ธ”์„ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • df_article (str) : ๋ฌธ์„œ์™€ ๋ ˆ์ด๋ธ”์„ ํฌํ•จํ•˜๋Š” DataFrame.
    • Returns

      • Pandas DataFrame (dataframe) : ๊ฐ ๋ฌธ์„œ์˜ ๋ ˆ์ด๋ธ”๊ณผ ๊ตฐ์ง‘, ๊ทธ๋ฆฌ๊ณ  ๋ฌธ์„œ๋ฅผ TF-IDF ์ž„๋ฒ ๋”ฉํ•˜์—ฌ ์ฐจ์›์ถ•์†Œํ•œ 2์ฐจ์› ์ขŒํ‘œ๋ฅผ ํฌํ•จํ•˜๋Š” DataFrame.
    • Examples

      Python Code (in Jupyter Notebook) :

      import pandas as pd
      
      clustering_result = dc.kmeans_clustering(tokenized_sentence_list, 3, 300)
      predict_list  = clustering_result["predict_list"]
      df_article = pd.DataFrame(tokenized_sentence_list, columns = ["label", "source", "datetime", "title", "content"])
      df_result = dc.get_tfidf_tsne(tokenized_sentence_list, predict_list, df_article)
      print(type(df_result))
      

      Output (in Jupyter Notebook) :

      pandas.core.frame.DataFrame
      
  • teanaps.text_analysis.DocumentClustering.get_cluster_graph(df_tfidf_tsne, label_type) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ๋ฅผ 2์ฐจ์›์œผ๋กœ ํ‘œํ˜„ํ•œ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • df_tfidf_tsne (DataFrame) : ๊ฐ ๋ฌธ์„œ์˜ ๋ ˆ์ด๋ธ”๊ณผ ๊ตฐ์ง‘, ๊ทธ๋ฆฌ๊ณ  ๋ฌธ์„œ๋ฅผ TF-IDF ์ž„๋ฒ ๋”ฉํ•˜์—ฌ ์ฐจ์›์ถ•์†Œํ•œ 2์ฐจ์› ์ขŒํ‘œ๋ฅผ ํฌํ•จํ•˜๋Š” DataFrame.
      • label_type (str) : ๊ทธ๋ž˜ํ”„์— ํ‘œ์‹œํ•  ๋ ˆ์ด๋ธ” ์œ ํ˜•. {"predict", "labe"} ์ค‘ ํ•˜๋‚˜.
    • Returns

      • plotly graph (graph object) : ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      dc. get_cluster_graph(df_result, "predict")
      

      Output (in Jupyter Notebook) : clustering_predict_scatter

      Python Code (in Jupyter Notebook) :

      dc. get_cluster_graph(df_result, "label")
      

      Output (in Jupyter Notebook) : clustering_label_scatter

  • teanaps.text_analysis.DocumentClustering.get_silhouette_score2(document_list, df_result) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์‹ค๋ฃจ์—ฃ ์Šค์ฝ”์–ด๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • document_list (list) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • df_result (DataFrame) : ๊ฐ ๋ฌธ์„œ์˜ ๋ ˆ์ด๋ธ”๊ณผ ๊ตฐ์ง‘, ๊ทธ๋ฆฌ๊ณ  ๋ฌธ์„œ๋ฅผ TF-IDF ์ž„๋ฒ ๋”ฉํ•˜์—ฌ ์ฐจ์›์ถ•์†Œํ•œ 2์ฐจ์› ์ขŒํ‘œ๋ฅผ ํฌํ•จํ•˜๋Š” DataFrame.
    • Returns

      • result (float) : ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์‹ค๋ฃจ์—ฃ ์Šค์ฝ”์–ด.
    • Examples

      Python Code (in Jupyter Notebook) :

      result = dc.get_silhouette_score2(tokenized_sentence_list, df_result)
      print(result)
      

      Output (in Jupyter Notebook) :

      0.1772473694643886
      
  • teanaps.text_analysis.DocumentClustering.get_silhouette_graph2(document_list, df_result) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์‹ค๋ฃจ์—ฃ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • document_list (str) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • df_result (DataFrame) : ๊ฐ ๋ฌธ์„œ์˜ ๋ ˆ์ด๋ธ”๊ณผ ๊ตฐ์ง‘, ๊ทธ๋ฆฌ๊ณ  ๋ฌธ์„œ๋ฅผ TF-IDF ์ž„๋ฒ ๋”ฉํ•˜์—ฌ ์ฐจ์›์ถ•์†Œํ•œ 2์ฐจ์› ์ขŒํ‘œ๋ฅผ ํฌํ•จํ•˜๋Š” DataFrame.
      • num_cluters (int) : ์ƒ์„ฑํ•  ๊ตฐ์ง‘์˜ ๊ฐœ์ˆ˜.
    • Returns

      • plotly graph (graph object) : ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ์™€ ์‹ค๋ฃจ์—ฃ ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      dc.get_silhouette_graph2(tokenized_sentence_list, df_result)
      

      Output (in Jupyter Notebook) : clustering_silhouette_graph

  • teanaps.text_analysis.DocumentClustering.get_pair_wize_matrix(document_list) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๊ฐ ๋ฌธ์„œ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๋งคํŠธ๋ฆญ์Šค๋กœ ํ‘œํ˜„ํ•œ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • document_list (str) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
    • Returns

      • plotly graph (graph object) : ๊ฐ ๋ฌธ์„œ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๋งคํŠธ๋ฆญ์Šค๋กœ ํ‘œํ˜„ํ•œ ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      dc.get_pair_wize_matrix(tokenized_sentence_list)
      

      Output (in Jupyter Notebook) : clustering_pair_wize_matrix

3.3. teanaps.text_analysis.TopicClustering

Python Code (in Jupyter Notebook) :

from teanaps.text_analysis import TopicClustering

tc = TopicClustering()
  • teanaps.text_analysis.TopicClustering.topic_modeling(modeling_type, document_list, topic_count, keyword_count) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์„œ์—์„œ N๊ฐœ์˜ ์ฃผ์ œ์— ๋Œ€ํ•ด ๊ฐ ๊ตฐ์ง‘๋ณ„ ํ‚ค์›Œ๋“œ๋ฅผ ๊ตฐ์ง‘ํ™”ํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • modeling_type (str) : ์ฃผ์ œ ๊ตฐ์ง‘ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์œ ํ˜•. {"lsa", "lda", "hdp"} ์ค‘ ํ•˜๋‚˜.
      • document_list (list) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • topic_count (int) : ์ฃผ์ œ ๊ตฐ์ง‘ ๊ฐœ์ˆ˜.
      • keyword_count (int) : ๊ฐ ์ฃผ์ œ ๊ตฐ์ง‘๋ณ„ ํ‚ค์›Œ๋“œ ๊ฐœ์ˆ˜.
    • Returns

      • result (list) : ๊ฐ ๊ตฐ์ง‘๋ณ„ ํ‚ค์›Œ๋“œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      result = tc.topic_modeling("lda", tokenized_sentence_list, 3, 5)
      print(result)
      

      Output (in Jupyter Notebook) :

      [(0,
       [('๊ธˆ๋ฆฌ', 0.6542361201136048),
        ('๋Œ€์ถœ', 0.4330323607960353),
        ('๊ธˆ์œต', 0.3083228589169829),
        ('์€ํ–‰', 0.22088983702295698),
        ('์ฝ”ํ”ฝ์Šค', 0.173373240489713)]),
      (1,
       [('๋น„ํŠธ์ฝ”์ธ', 0.6987330564487386),
        ('๊ธˆ๋ฆฌ', -0.25924223777122957),
        ('ํ™”ํ', 0.218391247175097),
        ('๊ธˆ์œต', 0.20393479642923928),
        ('์•”ํ˜ธ', 0.18284477353567058)]),
      (2,
       [('๋ถ€๋™์‚ฐ', -0.6584326085475736),
        ('๊ธˆ์œต', -0.40842310832729234),
        ('๋น„ํŠธ์ฝ”์ธ', 0.36212229767170806),
        ('๊ธˆ๋ฆฌ', 0.19995317435138174),
        ('์‹ ํƒ', -0.18356626669622753)])
      ]
      
  • teanaps.text_analysis.TopicClustering.get_model_validation_result() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์ฃผ์ œ ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด Perplexity, Coherence ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • result (tuple) : ์ฃผ์ œ ๊ตฐ์ง‘์˜ Perplexity, Coherence ๊ฐ’์„ ์ €์žฅํ•œ Tuple.
    • Examples

      Python Code (in Jupyter Notebook) :

      #result = tc.topic_modeling("lda", tokenized_sentence_list, 3, 5)
      perplexity, coherence = tc.get_model_validation_result()
      print(perplexity)
      print(coherence)
      

      Output (in Jupyter Notebook) :

      -6.633342221630287
      0.5127122691849578
      
  • teanaps.text_analysis.TopicClustering.get_model() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์ฃผ์ œ ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ ์ƒ์„ฑ๋œ ๋ชจ๋ธ์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • result (model) : ์ฃผ์ œ ๊ตฐ์ง‘ ๊ฒฐ๊ณผ ๋ชจ๋ธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #result = tc.topic_modeling("lda", tokenized_sentence_list, 3, 5)
      model = tc.get_model()
      
  • teanaps.text_analysis.TopicClustering.display_model_resul(model) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์ฃผ์ œ ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•˜์—ฌ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

        • model (model) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.*
    • Returns

      • result (IPython.core.display.HTML) : ์ฃผ์ œ ๊ตฐ์ง‘ ์‹œ๊ฐํ™” ๊ฒฐ๊ณผ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #result = tc.topic_modeling("lda", tokenized_sentence_list, 3, 5)
      model = tc.get_model()
      tc.display_model_result(model)
      

      Output (in Jupyter Notebook) : topic_clustering_lda_vis

  • teanaps.text_analysis.TopicClustering.get_topics_sentences(document_list) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์ฃผ์ œ ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด ๊ฐ ์ฃผ์ œ์— ํ•ด๋‹นํ•˜๋Š” ๋ฌธ์„œ๋ฅผ ์ฐพ์•„๋‚ด๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • document_list (str) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
    • Returns

      • Pandas DataFrame (dataframe) : ๊ฐ ์ฃผ์ œ์— ํ•ด๋‹นํ•˜๋Š” ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” DataFrame.
    • Examples

      Python Code (in Jupyter Notebook) :

      #result = tc.topic_modeling("lda", tokenized_sentence_list, 3, 5)
      result = tc.get_topics_sentences(tokenized_sentence_list)
      print(type(result))
      

      Output (in Jupyter Notebook) :

      pandas.core.frame.DataFrame
      
  • teanaps.text_analysis.TopicClustering.get_model_validation_graph(modeling_type, document_list, max_topic_count) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์ฃผ์ œ์˜ ๊ฐœ์ˆ˜๋ณ„ ์ฃผ์ œ ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด Perplexity, Coherence ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ผ์ธ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • modeling_type (str) : ์ฃผ์ œ ๊ตฐ์ง‘ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์œ ํ˜•. {"lsa", "lda", "hdp"} ์ค‘ ํ•˜๋‚˜.
      • document_list (list) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • max_topic_count (int) : ์ตœ๋Œ€ ์ฃผ์ œ ๊ตฐ์ง‘ ๊ฐœ์ˆ˜.
    • Returns

      • plotly graph (graph object) : ๊ฐ ์ฃผ์ œ ๊ตฐ์ง‘๋ณ„ Perplexity, Coherence ๊ฐ’์„ ํ‘œํ˜„ํ•œ ๋ผ์ธ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      tc.get_model_validation_graph("lda", tokenized_sentence_list, 10)
      

      Output (in Jupyter Notebook) : topic_clustering_find_topic_count

  • teanaps.text_analysis.sequence_lda_topic_modeling(document_list, time_slice, topic_count) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์„œ์—์„œ N๊ฐœ์˜ ์ฃผ์ œ์— ๋Œ€ํ•ด ๊ฐ ๊ตฐ์ง‘์˜ ๊ธฐ๊ฐ„๋ณ„ ๋ณ€ํ™” ์ถ”์ด๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • document_list (list) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • time_slice (list) : ์ „์ฒด ๋ฌธ์„œ๋ฅผ ๊ธฐ๊ฐ„์œผ๋กœ ๊ตฌ๋ถ„ํ•˜๋Š” ๋‹จ์œ„
      • topic_count (int) : ์ฃผ์ œ ๊ตฐ์ง‘ ๊ฐœ์ˆ˜.
    • Returns

      • result (list) : ๊ฐ ๊ธฐ๊ฐ„/๊ตฐ์ง‘๋ณ„ ํ‚ค์›Œ๋“œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      result = tc.sequence_lda_topic_modeling(tokenized_sentence_list, [100, 100, ..., 100], 5)
      print(result)
      

      Output (in Jupyter Notebook) :

      [(0,
       [[('์ฃผ์‹', 0.021218330732594246),
         ('์ข…๋ชฉ', 0.018796542321031225),
         ('์‹œ์žฅ', 0.01679681367262934),
         ...],
        [('์ฃผ์‹', 0.02193776754354376),
         ('์ข…๋ชฉ', 0.01936867384889522),
         ('์‹œ์žฅ', 0.016617304727897478),
         ...,
        ],
        ...,
       ],
       ...,
      ]
      
  • teanaps.text_analysis.get_sequence_topic_graph() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์„œ์—์„œ N๊ฐœ์˜ ์ฃผ์ œ์— ๋Œ€ํ•ด ๊ฐ ๊ตฐ์ง‘์˜ ๊ธฐ๊ฐ„๋ณ„ ๋ณ€ํ™” ์ถ”์ด๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์ด๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • plotly graph (graph object) : ๊ตฐ์ง‘์˜ ๊ธฐ๊ฐ„๋ณ„ ๋ณ€ํ™” ์ถ”์ด๋ฅผ ํ‘œํ•œํ•œ ๋ผ์ธ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      tc.get_sequence_topic_graph()
      

      Output (in Jupyter Notebook) : topic_clustering_topic_trend

3.4. teanaps.text_analysis.CoWordCalculator

Python Code (in Jupyter Notebook) :

from teanaps.text_analysis import CoWordCalculator

co = CoWordCalculator()
  • teanaps.text_analysis.CoWordCalculator.calculation_co_matrix(document_list, node_list=[]) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์„œ์— ํฌํ•จ๋œ ๋‹จ์–ด์˜ ๋™์‹œ์ถœํ˜„๋นˆ๋„๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • document_list (list) : ํ˜•ํƒœ์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌ๋œ ๋‹จ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • node_list (list) : ๋™์‹œ์ถœํ˜„๋นˆ๋„๋ฅผ ๊ณ„์‚ฐํ•  ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ.
    • Returns

      • None
    • Examples

      Python Code (in Jupyter Notebook) :

      node_list = ["๊ธˆ๋ฆฌ", "๊ธˆ์œต", "๋Œ€์ถœ", "๋น„ํŠธ์ฝ”์ธ", "๋ถ€๋™์‚ฐ", "์€ํ–‰", "์ฝ”ํ”ฝ์Šค", "์ž์‚ฐ", "์‹œ์žฅ", "์‹ ํƒ", "๊ทธ๋ฆผ์ž", "ํˆฌ์ž", "๊ฑฐ๋ž˜", "์ •๋ถ€", "์ƒํ’ˆ", "์‹ ์šฉ", "๋ฆฌ์Šคํฌ"]
      co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list)
      
  • teanaps.text_analysis.CoWordCalculator.get_edge_list() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์„œ์— ํฌํ•จ๋œ ๋‹จ์–ด์˜ ๋™์‹œ์ถœํ˜„ ์ˆœ์„œ์Œ๊ณผ ์ถœํ˜„๋นˆ๋„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • result (list) : ((๋‹จ์–ด, ๋‹จ์–ด), ๋™์‹œ์ถœํ˜„๋นˆ๋„) ๊ตฌ์กฐ์˜ Tuple์„ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #node_list = ["๊ธˆ๋ฆฌ", "๊ธˆ์œต", "๋Œ€์ถœ", "๋น„ํŠธ์ฝ”์ธ", "๋ถ€๋™์‚ฐ", "์€ํ–‰", "์ฝ”ํ”ฝ์Šค", "์ž์‚ฐ", "์‹œ์žฅ", "์‹ ํƒ", "๊ทธ๋ฆผ์ž", "ํˆฌ์ž", "๊ฑฐ๋ž˜", "์ •๋ถ€", "์ƒํ’ˆ", "์‹ ์šฉ", "๋ฆฌ์Šคํฌ"]
      #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list)
      result = co.get_edge_list()
      print(result)
      

      Output (in Jupyter Notebook) :

      [(('๊ธˆ๋ฆฌ', '๊ธˆ๋ฆฌ'), 905),
       (('๊ธˆ์œต', '๊ธˆ์œต'), 791),
       (('๋Œ€์ถœ', '๋Œ€์ถœ'), 580),
       (('๋น„ํŠธ์ฝ”์ธ', '๋น„ํŠธ์ฝ”์ธ'), 565),
       (('๋ถ€๋™์‚ฐ', '๋ถ€๋™์‚ฐ'), 555),
       ...,
       (('๋Œ€์ถœ', '์‹ ํƒ'), 1),
       (('๊ธˆ๋ฆฌ', '์ž์‚ฐ'), 1),
       (('์ž์‚ฐ', '๊ธˆ๋ฆฌ'), 1),
       (('์‹ ํƒ', 'ํˆฌ์ž'), 1),
       (('ํˆฌ์ž', '์‹ ํƒ'), 1)
      ]
      
  • teanaps.text_analysis.CoWordCalculator.get_node_list() Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋™์‹œ์ถœํ˜„๋นˆ๋„ ๊ณ„์‚ฐ์— ํฌํ•จ๋œ ๋ชจ๋“  ๋‹จ์–ด๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • None
    • Returns

      • result (list) : ๋™์‹œ์ถœํ˜„๋นˆ๋„ ๊ณ„์‚ฐ์— ํฌํ•จ๋œ ๋ชจ๋“  ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #node_list = ["๊ธˆ๋ฆฌ", "๊ธˆ์œต", "๋Œ€์ถœ", "๋น„ํŠธ์ฝ”์ธ", "๋ถ€๋™์‚ฐ", "์€ํ–‰", "์ฝ”ํ”ฝ์Šค", "์ž์‚ฐ", "์‹œ์žฅ", "์‹ ํƒ", "๊ทธ๋ฆผ์ž", "ํˆฌ์ž", "๊ฑฐ๋ž˜", "์ •๋ถ€", "์ƒํ’ˆ", "์‹ ์šฉ", "๋ฆฌ์Šคํฌ"]
      #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list)
      result = co.get_node_list()
      print(result)
      

      Output (in Jupyter Notebook) :

      ['๊ธˆ๋ฆฌ', '๊ธˆ์œต', '๋Œ€์ถœ', '๋น„ํŠธ์ฝ”์ธ', '๋ถ€๋™์‚ฐ', '์€ํ–‰', '์ฝ”ํ”ฝ์Šค', '์ž์‚ฐ', '์‹œ์žฅ', '์‹ ํƒ', '๊ทธ๋ฆผ์ž', 'ํˆฌ์ž', '๊ฑฐ๋ž˜', '์ •๋ถ€', '์ƒํ’ˆ', '์‹ ์šฉ', '๋ฆฌ์Šคํฌ']
      
  • teanaps.text_analysis.CoWordCalculator.get_co_word(word) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ํŠน์ • ๋‹จ์–ด๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋‹ค๋ฅธ ๋‹จ์–ด๋“ค๊ณผ์˜ ๋™์‹œ์ถœํ˜„๋นˆ๋„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • word (str) : ๋™์‹œ์ถœํ˜„๋นˆ๋„๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ ๊ธฐ์ค€์ด ๋˜๋Š” ๋‹จ์–ด.
    • Returns

      • result (list) : (๋‹จ์–ด, ๋™์‹œ์ถœํ˜„๋นˆ๋„) ๊ตฌ์กฐ์˜ Tuple์„ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #node_list = ["๊ธˆ๋ฆฌ", "๊ธˆ์œต", "๋Œ€์ถœ", "๋น„ํŠธ์ฝ”์ธ", "๋ถ€๋™์‚ฐ", "์€ํ–‰", "์ฝ”ํ”ฝ์Šค", "์ž์‚ฐ", "์‹œ์žฅ", "์‹ ํƒ", "๊ทธ๋ฆผ์ž", "ํˆฌ์ž", "๊ฑฐ๋ž˜", "์ •๋ถ€", "์ƒํ’ˆ", "์‹ ์šฉ", "๋ฆฌ์Šคํฌ"]
      #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list)
      result = co.get_co_word("๊ธˆ๋ฆฌ")
      print(result)
      

      Output (in Jupyter Notebook) :

      [('๋Œ€์ถœ', 341),
       ('์ฝ”ํ”ฝ์Šค', 105),
       ('์€ํ–‰', 82),
       ...,
       ('์ •๋ถ€', 2),
       ('๋น„ํŠธ์ฝ”์ธ', 1),
       ('์ž์‚ฐ', 1)
      ]
      
  • teanaps.text_analysis.CoWordCalculator.get_centrality(centrality_type) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋‹จ์–ด์˜ ๋™์‹œ์ถœํ˜„ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋„คํŠธ์›Œํฌ ์ค‘์‹ฌ์„ฑ์„ ๊ณ„์‚ฐํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • centrality_type (str) : ๋„คํŠธ์›Œํฌ ์ค‘์‹ฌ์„ฑ ์œ ํ˜•. {"d_cent", "b_cent", "c_cent"} ์ค‘ ํ•˜๋‚˜.
    • Returns

      • result (dict) : ๋‹จ์–ด์™€ ๋‹จ์–ด์˜ ์ค‘์‹ฌ์„ฑ์„ ํฌํ•จํ•˜๋Š” ๋”•์…”๋„ˆ๋ฆฌ.
    • Examples

      Python Code (in Jupyter Notebook) :

      #node_list = ["๊ธˆ๋ฆฌ", "๊ธˆ์œต", "๋Œ€์ถœ", "๋น„ํŠธ์ฝ”์ธ", "๋ถ€๋™์‚ฐ", "์€ํ–‰", "์ฝ”ํ”ฝ์Šค", "์ž์‚ฐ", "์‹œ์žฅ", "์‹ ํƒ", "๊ทธ๋ฆผ์ž", "ํˆฌ์ž", "๊ฑฐ๋ž˜", "์ •๋ถ€", "์ƒํ’ˆ", "์‹ ์šฉ", "๋ฆฌ์Šคํฌ"]
      #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list)
      result = co.get_centrality("d_cent")
      print(result)
      

      Output (in Jupyter Notebook) :

      {'๊ฑฐ๋ž˜': 0.625,
       '๊ทธ๋ฆผ์ž': 0.5625,
       '๊ธˆ๋ฆฌ': 0.9375,
       ...,
       '์ •๋ถ€': 0.75,
       '์ฝ”ํ”ฝ์Šค': 0.5625,
       'ํˆฌ์ž': 0.625
      }
      
  • teanaps.text_analysis.CoWordCalculator.get_co_matrix_graph(max_count) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋‹จ์–ด์˜ ๋™์‹œ์ถœํ˜„๋นˆ๋„ ์ƒ์œ„ N๊ฐœ๋ฅผ ๋งคํŠธ๋ฆญ์Šค ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • max_count (int) : ๋งคํŠธ๋ฆญ์Šค ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•  ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜.
    • Returns

      • plotly graph (graph object) : ๋™์‹œ์ถœํ˜„๋นˆ๋„๋ฅผ ํ‘œํ˜„ํ•œ ๋งคํŠธ๋ฆญ์Šค ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      #node_list = ["๊ธˆ๋ฆฌ", "๊ธˆ์œต", "๋Œ€์ถœ", "๋น„ํŠธ์ฝ”์ธ", "๋ถ€๋™์‚ฐ", "์€ํ–‰", "์ฝ”ํ”ฝ์Šค", "์ž์‚ฐ", "์‹œ์žฅ", "์‹ ํƒ", "๊ทธ๋ฆผ์ž", "ํˆฌ์ž", "๊ฑฐ๋ž˜", "์ •๋ถ€", "์ƒํ’ˆ", "์‹ ์šฉ", "๋ฆฌ์Šคํฌ"]
      #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list)
      co.get_co_matrix_graph(max_count)
      

      Output (in Jupyter Notebook) : coword_matrix

  • teanaps.text_analysis.CoWordCalculator.get_word_network_graph(centrality_dict, mode="markers", centrality_th=0.5, ego_node_list=[], node_size_rate=10, edge_width_rate=10, text_size_rate=10) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋‹จ์–ด์˜ ๋™์‹œ์ถœํ˜„ ๊ด€๊ณ„๋ฅผ ๋„คํŠธ์›Œํฌ ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • centrality_dict (dict) : ๋‹จ์–ด์™€ ๋‹จ์–ด์˜ ์ค‘์‹ฌ์„ฑ์„ ํฌํ•จํ•˜๋Š” ๋”•์…”๋„ˆ๋ฆฌ. teanaps.text_analysis.CoWordCalculator.get_centrality์ฐธ๊ณ .
      • mode (str) : ๊ทธ๋ž˜ํ”„ ๋…ธ๋“œ ํ‘œํ˜„๋ฐฉ์‹. {"markers", "text", "markers+text"} ์ค‘ ํ•˜๋‚˜ ์ž…๋ ฅ.
      • centrality_th (float) : ๋…ธ๋“œ ํ•„ํ„ฐ๋ง ๊ธฐ์ค€ ์ค‘์‹ฌ์„ฑ ์ˆ˜์น˜. ์ž…๋ ฅํ•œ ๊ฐ’ ์ด์ƒ์˜ ์ค‘์‹ฌ์„ฑ์„ ๊ฐ€์ง„ ๋…ธ๋“œ๋งŒ ๊ทธ๋ž˜ํ”„์— ํ‘œํ˜„๋จ.
      • ego_node_list (list) : ์—๊ณ  ๋„คํŠธ์›Œํฌ๋ฅผ ์ƒ์„ฑํ•  ์ค‘์‹ฌ๋…ธ๋“œ ๋ฆฌ์ŠคํŠธ. ์ž…๋ ฅ๋œ ๋…ธ๋“œ์™€ ์ง์ ‘ ์—ฐ๊ฒฐ๋œ ๋…ธ๋“œ๋งŒ ๊ทธ๋ž˜ํ”„์— ํ‘œํ˜„๋จ.
      • node_size_rate (int) : ๋…ธ๋“œ ์‚ฌ์ด์ฆˆ ํ‘œํ˜„ ๊ฐ€์ค‘์น˜. ์ˆ˜์น˜๊ฐ€ ๋†’์„์ˆ˜๋ก ๋…ธ๋“œ์˜ ํฌ๊ธฐ๊ฐ€ ํฌ๊ฒŒ ํ‘œํ˜„๋จ.
      • edge_width_rate (int) : ์—ฃ์ง€ ๋‘๊ป˜ ํ‘œํ˜„ ๊ฐ€์ค‘์น˜. ์ˆ˜์น˜๊ฐ€ ๋†’์„์ˆ˜๋ก ์—ฃ์ง€์˜ ๋‘๊ป˜๊ฐ€ ๊ฐ€๋Š˜๊ฒŒ ํ‘œํ˜„๋จ.
      • text_size_rate (int) : ํ…์ŠคํŠธ ๋ ˆ์ด๋ธ” ํฌ๊ธฐ ํ‘œํ˜„ ๊ฐ€์ค‘์น˜. ์ˆ˜์น˜๊ฐ€ ๋†’์„์ˆ˜๋ก ํ…์ŠคํŠธ ๋ ˆ์ด๋ธ” ํฌ๊ธฐ๊ฐ€ ์ž‘๊ฒŒ ํ‘œํ˜„๋จ.
    • Returns

      • plotly graph (graph object) : ๋™์‹œ์ถœํ˜„๋นˆ๋„๋ฅผ ํ‘œํ˜„ํ•œ ๋„คํŠธ์›Œํฌ ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      #node_list = ["๊ธˆ๋ฆฌ", "๊ธˆ์œต", "๋Œ€์ถœ", "๋น„ํŠธ์ฝ”์ธ", "๋ถ€๋™์‚ฐ", "์€ํ–‰", "์ฝ”ํ”ฝ์Šค", "์ž์‚ฐ", "์‹œ์žฅ", "์‹ ํƒ", "๊ทธ๋ฆผ์ž", "ํˆฌ์ž", "๊ฑฐ๋ž˜", "์ •๋ถ€", "์ƒํ’ˆ", "์‹ ์šฉ", "๋ฆฌ์Šคํฌ"]
      #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list)
      #centrality_dict = co.get_centrality("d_cent")
      co.get_word_network_graph(centrality_dict,  mode="markers")
      

      Output (in Jupyter Notebook) : word_network_markers

      Python Code (in Jupyter Notebook) :

      #node_list = ["๊ธˆ๋ฆฌ", "๊ธˆ์œต", "๋Œ€์ถœ", "๋น„ํŠธ์ฝ”์ธ", "๋ถ€๋™์‚ฐ", "์€ํ–‰", "์ฝ”ํ”ฝ์Šค", "์ž์‚ฐ", "์‹œ์žฅ", "์‹ ํƒ", "๊ทธ๋ฆผ์ž", "ํˆฌ์ž", "๊ฑฐ๋ž˜", "์ •๋ถ€", "์ƒํ’ˆ", "์‹ ์šฉ", "๋ฆฌ์Šคํฌ"]
      #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list)
      #centrality_dict = co.get_centrality("d_cent")
      co.get_word_network_graph(centrality_dict,  mode="text")
      

      Output (in Jupyter Notebook) : word_network_text

      Python Code (in Jupyter Notebook) :

      #node_list = ["๊ธˆ๋ฆฌ", "๊ธˆ์œต", "๋Œ€์ถœ", "๋น„ํŠธ์ฝ”์ธ", "๋ถ€๋™์‚ฐ", "์€ํ–‰", "์ฝ”ํ”ฝ์Šค", "์ž์‚ฐ", "์‹œ์žฅ", "์‹ ํƒ", "๊ทธ๋ฆผ์ž", "ํˆฌ์ž", "๊ฑฐ๋ž˜", "์ •๋ถ€", "์ƒํ’ˆ", "์‹ ์šฉ", "๋ฆฌ์Šคํฌ"]
      #co.calculation_co_matrix(tokenized_sentence_list, node_list=node_list)
      #centrality_dict = co.get_centrality("d_cent")
      co.get_word_network_graph(centrality_dict,  mode="markers+text")
      

      Output (in Jupyter Notebook) : word_network_markers_text

3.5. teanaps.text_analysis.SentimentAnalysis

Python Code (in Jupyter Notebook) :

from teanaps.text_analysis import SentimentAnalysis

senti = SentimentAnalysis(model_path="/model", kobert_path="/kobert")

Notes :

  • ๋ชจ๋ธ๊ณผ KoBERT ํŒŒ์ผ์„ ๋ณ„๋„๋กœ ๋‹ค์šด๋กœ๋“œ(๋ชจ๋ธ/KoBERT)ํ•˜์—ฌ ํŒŒ์ผ ๊ฒฝ๋กœ๋ฅผ ๊ฐ๊ฐ model_path, kobert_path ๋ณ€์ˆ˜์— ํฌํ•จํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
  • teanaps.text_analysis.SentimentAnalysis.tag(sentence, neutral_th=0.5) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์žฅ์˜ ๊ฐ์„ฑ์ˆ˜์ค€์„ ๊ธ์ • ๋˜๋Š” ๋ถ€์ •์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • sentence (str) : ํ•œ๊ตญ์–ด ๋˜๋Š” ์˜์–ด๋กœ ๊ตฌ์„ฑ๋œ ๋ฌธ์žฅ. ์ตœ๋Œ€ 128์ž.
      • neutral_th (float) : ๊ธ์ • ๋˜๋Š” ๋ถ€์ •์˜ ๊ฐ•๋„ ์ฐจ์ด์—์„œ ์ค‘๋ฆฝ์œผ๋กœ ํŒ๋‹จํ•˜๋Š” ๋ฒ”์œ„. 0~1.
    • Returns

      • result (list) : ((๋ถ€์ • ๊ฐ•๋„, ๊ธ์ • ๊ฐ•๋„), ๊ธ/๋ถ€์ • ๋ผ๋ฒจ) ๊ตฌ์กฐ์˜ Tuple์„ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ. ๊ธ์ •/๋ถ€์ • ๊ฐ•๋„๋Š” 0~1. ๊ธ๋ถ€์ • ๋ผ๋ฒจ์€ {"positive", "negative"} ์ค‘ ํ•˜๋‚˜.
    • Examples

      Python Code (in Jupyter Notebook) :

      sentence = "๋Š˜ ๋ฐฐ์šฐ๊ณ  ๋ฐฐํ‘ธ๋Š” ์ž์„ธ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค."
      result = senti.tag(sentence, neutral_th=0.3)
      print(result)
      

      Output (in Jupyter Notebook) :

      ((0.0595, 0.9543), 'positive')
      

      Python Code (in Jupyter Notebook) :

      sentence = "๊ณผํ•œ ์š•์‹ฌ์€ ์ฃผ๋ณ€ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ํ”ผํ•ด๋ฅผ ์ค๋‹ˆ๋‹ค."
      result = senti.tag(sentence, neutral_th=0.3)
      print(result)
      

      Output (in Jupyter Notebook) :

      ((0.8715, 0.1076), 'negative')
      
  • teanaps.text_analysis.SentimentAnalysis.get_weight(sentence) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๊ฐ์„ฑ์ˆ˜์ค€ ๋ถ„๋ฅ˜์— ์ฐธ์กฐ๋œ ๊ฐ ๊ฐ ํ˜•ํƒœ์†Œ๋ณ„ ๊ฐ€์ค‘์น˜๋ฅผ ํ•˜์ด๋ผ์ดํŠธํ•œ ํ˜•ํƒœ์˜ ๋ฌธ์žฅ ๊ทธ๋ž˜ํ”„๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • sentence (str) : ํ•œ๊ตญ์–ด ๋˜๋Š” ์˜์–ด๋กœ ๊ตฌ์„ฑ๋œ ๋ฌธ์žฅ. ์ตœ๋Œ€ 128์ž.
    • Returns

      • token_list (list) : ๋ฌธ์žฅ์˜ ๊ฐ ํ˜•ํƒœ์†Œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • weight_list (list) : ๋ฌธ์žฅ์˜ ๊ฐ ํ˜•ํƒœ์†Œ ๋ณ„ ๊ฐ€์ค‘์น˜๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      sentence = "๋Š˜ ๋ฐฐ์šฐ๊ณ  ๋ฐฐํ‘ธ๋Š” ์ž์„ธ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค."
      token_list, weight_list = senti.get_weight(sentence)
      print(token_list)
      print(weight_list)
      

      Output (in Jupyter Notebook) :

      [' ๋Š˜', ' ๋ฐฐ์šฐ', '๊ณ ', ' ๋ฐฐ', 'ํ‘ธ', '๋Š”', ' ์ž์„ธ', '๊ฐ€', ' ํ•„์š”', 'ํ•ฉ๋‹ˆ๋‹ค', ' ', '.']
      [0.072522074, 0.08697342, 0.052703843, 0.051040735, 0.0606895, 0.05134341, 0.05213573, 0.08644837, 0.078125894, 0.079360135, 0, 0.079488374]
      

      Python Code (in Jupyter Notebook) :

      sentence = "๊ณผํ•œ ์š•์‹ฌ์€ ์ฃผ๋ณ€ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ํ”ผํ•ด๋ฅผ ์ค๋‹ˆ๋‹ค."
      token_list, weight_list = senti.get_weight(sentence)
      print(token_list)
      print(weight_list)
      

      Output (in Jupyter Notebook) :

      [' ', '๊ณผ', 'ํ•œ', ' ์š•์‹ฌ', '์€', ' ์ฃผ๋ณ€', ' ์‚ฌ๋žŒ๋“ค', '์—๊ฒŒ', ' ํ”ผํ•ด๋ฅผ', ' ', '์ค', '๋‹ˆ๋‹ค', ' ', '.']
      [0, 0.020344315, 0.024879746, 0.02612342, 0.03615231, 0.048542265, 0.06707654, 0.0936653, 0.07649707, 0, 0.08189902, 0.08962273, 0, 0.07841993]
      
  • teanaps.text_analysis.SentimentAnalysis.draw_weight(sentence) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๊ฐ์„ฑ์ˆ˜์ค€ ๋ถ„๋ฅ˜์— ์ฐธ์กฐ๋œ ๊ฐ ๊ฐ ํ˜•ํƒœ์†Œ๋ณ„ ๊ฐ€์ค‘์น˜๋ฅผ ํžˆ์Šคํ† ๊ทธ๋žจ์œผ๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • sentence (str) : ํ•œ๊ตญ์–ด ๋˜๋Š” ์˜์–ด๋กœ ๊ตฌ์„ฑ๋œ ๋ฌธ์žฅ. ์ตœ๋Œ€ 128์ž.
    • Returns

      • plotly graph (graph object) : ๊ฐ์„ฑ์ˆ˜์ค€ ๋ถ„๋ฅ˜์— ์ฐธ์กฐ๋œ ๊ฐ ๊ฐ ํ˜•ํƒœ์†Œ์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜ ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      sentence = "๋Š˜ ๋ฐฐ์šฐ๊ณ  ๋ฐฐํ‘ธ๋Š” ์ž์„ธ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค."
      senti.draw_weight(sentence)
      

      Output (in Jupyter Notebook) : sentiment_pos_histogram

      Python Code (in Jupyter Notebook) :

      sentence = "๊ณผํ•œ ์š•์‹ฌ์€ ์ฃผ๋ณ€ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ํ”ผํ•ด๋ฅผ ์ค๋‹ˆ๋‹ค."
      senti.draw_weight(sentence)
      

      Output (in Jupyter Notebook) : sentiment_neg_histogram

  • teanaps.text_analysis.SentimentAnalysis.draw_sentence_weight(sentence) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๊ฐ์„ฑ์ˆ˜์ค€ ๋ถ„๋ฅ˜์— ์ฐธ์กฐ๋œ ๊ฐ ๊ฐ ํ˜•ํƒœ์†Œ๋ณ„ ๊ฐ€์ค‘์น˜๋ฅผ ํ•˜์ด๋ผ์ดํŠธํ•œ ํ˜•ํƒœ์˜ ๋ฌธ์žฅ ๊ทธ๋ž˜ํ”„๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • sentence (str) : ํ•œ๊ตญ์–ด ๋˜๋Š” ์˜์–ด๋กœ ๊ตฌ์„ฑ๋œ ๋ฌธ์žฅ. ์ตœ๋Œ€ 128์ž.
    • Returns

      • plotly graph (graph object) : ๋ฌธ์žฅ ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      sentence = "๋Š˜ ๋ฐฐ์šฐ๊ณ  ๋ฐฐํ‘ธ๋Š” ์ž์„ธ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค."
      senti.draw_sentence_weight(sentence)
      

      Output (in Jupyter Notebook) : sentiment_weight_pos

      Python Code (in Jupyter Notebook) :

      sentence = "๊ณผํ•œ ์š•์‹ฌ์€ ์ฃผ๋ณ€ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ํ”ผํ•ด๋ฅผ ์ค๋‹ˆ๋‹ค."
      senti.draw_sentence_weight(sentence)
      

      Output (in Jupyter Notebook) : sentiment_weight_neg

  • teanaps.text_analysis.SentimentAnalysis.get_sentiment_parse(sentence, neutral_th=0.3, , tagger="mecab", model_path="/model") Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์žฅ์˜ ๊ฐ ์–ด์ ˆ์— ๋Œ€ํ•œ ๊ฐ์„ฑ์ˆ˜์ค€์„ ๊ธ์ • ๋˜๋Š” ๋ถ€์ •์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ  ๊ทธ ๊ฐ€์ค‘์น˜๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • sentence (str) : ํ•œ๊ตญ์–ด ๋˜๋Š” ์˜์–ด๋กœ ๊ตฌ์„ฑ๋œ ๋ฌธ์žฅ. ์ตœ๋Œ€ 128์ž.
      • neutral_th (float) : ๊ธ์ • ๋˜๋Š” ๋ถ€์ •์˜ ๊ฐ•๋„ ์ฐจ์ด์—์„œ ์ค‘๋ฆฝ์œผ๋กœ ํŒ๋‹จํ•˜๋Š” ๋ฒ”์œ„. 0~1.
      • tagger (str) : ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ {"okt", "mecab", "mecab-ko", "kkma"} ์ค‘ ํ•˜๋‚˜ ์ž…๋ ฅ. teanaps.nlp.ma.set_tagger ์ฐธ๊ณ .
      • model_path (str) : ๊ฐœ์ฒด๋ช…์ธ์‹ ๋ชจ๋ธ ํŒŒ์ผ ๊ฒฝ๋กœ. teanaps.nlp.ner.parse ์ฐธ๊ณ .
    • Returns

      • phrase_token_weight_list (list) : ์–ด์ ˆ๊ณผ ๊ฐ ์–ด์ ˆ์— ๋Œ€ํ•œ ๊ฐ์„ฑ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • token_list (list) : ๋ฌธ์žฅ์˜ ๊ฐ ํ˜•ํƒœ์†Œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
      • weight_list (list) : ๋ฌธ์žฅ์˜ ๊ฐ ํ˜•ํƒœ์†Œ ๋ณ„ ๊ฐ€์ค‘์น˜๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      sentence = "์š•์‹ฌ์Ÿ์ด์—๊ฒŒ ์ŠคํŠธ๋ ˆ์Šค ๋ฐ›์œผ๋ฉฐ ์‚ด๋‹ค๊ฐ€ ๋– ๋‚˜๊ณ ๋‚˜๋‹ˆ ๋„ˆ๋ฌด ํ–‰๋ณตํ•ด์š”!"
      phrase_token_weight_list, token_list, weight_list = senti.get_sentiment_parse(sentence, neutral_th=0.5)
      print(phrase_token_weight_list)
      print(token_list)
      print(weight_list)
      

      Output (in Jupyter Notebook) :

      [(((0.5991, 0.3836), 'neutral'), '์š•์‹ฌ์Ÿ์ด์—๊ฒŒ', [('์š•์‹ฌ์Ÿ์ด', 'NNG', 'UN', (0, 4))], [('์—๊ฒŒ', 'JKB', 'UN', (4, 6))](/fingeredman/teanaps/wiki/('์š•์‹ฌ์Ÿ์ด',-'NNG',-'UN',-(0,-4))],-[('์—๊ฒŒ',-'JKB',-'UN',-(4,-6)))), (((0.9147, 0.0828), 'negative'), '์ŠคํŠธ๋ ˆ์Šค ๋ฐ›์œผ๋ฉฐ', [('์ŠคํŠธ๋ ˆ์Šค', 'NNG', 'UN', (7, 11)), ('๋ฐ›', 'VV', 'UN', (12, 13))], [('์œผ๋ฉฐ', 'EC', 'UN', (13, 15))](/fingeredman/teanaps/wiki/('์ŠคํŠธ๋ ˆ์Šค',-'NNG',-'UN',-(7,-11)),-('๋ฐ›',-'VV',-'UN',-(12,-13))],-[('์œผ๋ฉฐ',-'EC',-'UN',-(13,-15)))), (((0.9047, 0.0953), 'negative'), '์‚ด๋‹ค๊ฐ€', [('์‚ด', 'VV', 'UN', (16, 17))], [('๋‹ค๊ฐ€', 'EC', 'UN', (17, 19))](/fingeredman/teanaps/wiki/('์‚ด',-'VV',-'UN',-(16,-17))],-[('๋‹ค๊ฐ€',-'EC',-'UN',-(17,-19)))), (((0.8306, 0.1751), 'negative'), '๋– ๋‚˜๊ณ ', [('๋– ๋‚˜', 'VV', 'UN', (20, 22))], [('๊ณ ', 'EC', 'UN', (22, 23))](/fingeredman/teanaps/wiki/('๋– ๋‚˜',-'VV',-'UN',-(20,-22))],-[('๊ณ ',-'EC',-'UN',-(22,-23)))), (((0.453, 0.5296), 'neutral'), '๋‚˜๋‹ˆ', [('๋‚˜', 'VX', 'UN', (23, 24))], [('๋‹ˆ', 'EC', 'UN', (24, 25))](/fingeredman/teanaps/wiki/('๋‚˜',-'VX',-'UN',-(23,-24))],-[('๋‹ˆ',-'EC',-'UN',-(24,-25)))), (((0.1065, 0.8982), 'positive'), '๋„ˆ๋ฌด ํ–‰๋ณตํ•ด์š”!', [('๋„ˆ๋ฌด', 'MAG', 'UN', (26, 28))], [('ํ–‰๋ณต', 'NNG', 'UN', (29, 31))], [('ํ•ด์š”', 'XSV+EF', 'UN', (31, 33)), ('!', 'SW', 'UN', (33, 34))](/fingeredman/teanaps/wiki/('๋„ˆ๋ฌด',-'MAG',-'UN',-(26,-28))],-[('ํ–‰๋ณต',-'NNG',-'UN',-(29,-31))],-[('ํ•ด์š”',-'XSV+EF',-'UN',-(31,-33)),-('!',-'SW',-'UN',-(33,-34))))]
      [' ์š•์‹ฌ', '์Ÿ', '์ด', '์—๊ฒŒ', ' ์ŠคํŠธ๋ ˆ์Šค', ' ๋ฐ›์œผ๋ฉฐ', ' ์‚ด', '๋‹ค', '๊ฐ€', ' ๋– ๋‚˜', '๊ณ ', ' ๋‚˜', '๋‹ˆ', ' ๋„ˆ๋ฌด', ' ํ–‰๋ณต', 'ํ•ด', '์š”', ' ', '!']
      [0, 0, 0, 0, -0.2424436, -0.20117857, -0.16506892, -0.16892226, -0.27025366, -0.16876356, -0.33119142, 0, 0, 0.15942541, 0.13346915, 0.11855107, 0.15605149, 0, 0.11754697]
      
  • teanaps.text_analysis.SentimentAnalysis.draw_sentiment_parse(token_list, weight_list) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๋ฌธ์žฅ์˜ ๊ฐ ์–ด์ ˆ์— ๋Œ€ํ•œ ๊ฐ์„ฑ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ํ•˜์ด๋ผ์ดํŠธํ•œ ํ˜•ํƒœ์˜ ๋ฌธ์žฅ ๊ทธ๋ž˜ํ”„๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • token_list (list) : ๋ฌธ์žฅ์˜ ๊ฐ ํ˜•ํƒœ์†Œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ. teanaps.text_analysis.SentimentAnalysis.get_sentiment_parse์ฐธ๊ณ .
      • weight_list (list) : ๋ฌธ์žฅ์˜ ๊ฐ ํ˜•ํƒœ์†Œ ๋ณ„ ๊ฐ€์ค‘์น˜๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ. teanaps.text_analysis.SentimentAnalysis.get_sentiment_parse์ฐธ๊ณ .
    • Returns

      • plotly graph (graph object) : ๋ฌธ์žฅ ๊ทธ๋ž˜ํ”„.
    • Examples

      Python Code (in Jupyter Notebook) :

      #sentence = "์š•์‹ฌ์Ÿ์ด์—๊ฒŒ ์ŠคํŠธ๋ ˆ์Šค ๋ฐ›์œผ๋ฉฐ ์‚ด๋‹ค๊ฐ€ ๋– ๋‚˜๊ณ ๋‚˜๋‹ˆ ๋„ˆ๋ฌด ํ–‰๋ณตํ•ด์š”!"
      #token_list = [' ์š•์‹ฌ', '์Ÿ', '์ด', '์—๊ฒŒ', ' ์ŠคํŠธ๋ ˆ์Šค', ' ๋ฐ›์œผ๋ฉฐ', ' ์‚ด', '๋‹ค', '๊ฐ€', ' ๋– ๋‚˜', '๊ณ ', ' ๋‚˜', '๋‹ˆ', ' ๋„ˆ๋ฌด', ' ํ–‰๋ณต', 'ํ•ด', '์š”', ' ', '!']
      #weight_list = [0, 0, 0, 0, -0.2424436, -0.20117857, -0.16506892, -0.16892226, -0.27025366, -0.16876356, -0.33119142, 0, 0, 0.15942541, 0.13346915, 0.11855107, 0.15605149, 0, 0.11754697]
      senti.draw_sentiment_parse(token_list, weight_list)
      

      Output (in Jupyter Notebook) : sentiment_parse

3.6. teanaps.text_analysis.DocumentSummarizer

Python Code (in Jupyter Notebook) :

from teanaps.text_analysis import DocumentSummarizer

ds = DocumentSummarizer()
  • teanaps.text_analysis.DocumentSummarizer.set_document(document_path) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ์š”์•ฝํ•  ๋ฌธ์„œ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

    • Parameters

      • document_path (str) : ์š”์•ฝํ•  ๋ฌธ์„œ๊ฐ€ ์ €์žฅ๋œ ํ…์ŠคํŠธ ํŒŒ์ผ(.txt) ๊ฒฝ๋กœ.
    • Returns

      • None
    • Examples

      Python Code (in Jupyter Notebook) :

      document_path = "article.txt"
      ds.set_document(document_path)
      
  • teanaps.text_analysis.DocumentSummarizer.summarize(type, max_sentence) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)

    • ๊ฐ์„ฑ์ˆ˜์ค€ ๋ถ„๋ฅ˜์— ์ฐธ์กฐ๋œ ๊ฐ ๊ฐ ํ˜•ํƒœ์†Œ๋ณ„ ๊ฐ€์ค‘์น˜๋ฅผ ํ•˜์ด๋ผ์ดํŠธํ•œ ํ˜•ํƒœ์˜ ๋ฌธ์žฅ ๊ทธ๋ž˜ํ”„๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • type (str) : ํ…์ŠคํŠธ ์š”์•ฝ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์œ ํ˜•. {"textrank", "lsa"} ์ค‘ ํ•˜๋‚˜.
      • max_sentence (int) : ์š”์•ฝ์„ ํ†ตํ•ด ์ถ”์ถœํ•  ๋ฌธ์žฅ์˜ ๊ฐœ์ˆ˜.
    • Returns

      • sentence_list (list) : ์š”์•ฝ ์ถ”์ถœ๋œ ๋ฌธ์žฅ์„ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
    • Examples

      Text File (in "article.txt") :

      • โ€˜์†์„ธ์ด์…”๋„โ€™ ์†ํฅ๋ฏผ(28, ํ† ํŠธ๋„˜ ํ™‹์Šคํผ)์ด ํŒ€ ์Šน๋ฆฌ์˜ ๊ฒฐ์Šน๊ณจ์„ ๋„ฃ์—ˆ์œผ๋‚˜ ๋†’์€ ํ‰์ ์„ ๋ฐ›์ง€ ๋ชปํ–ˆ๋‹ค. ์ „์ฒด์ ์œผ๋กœ ๊ฒฝ๊ธฐ๋ ฅ์ด ์ข‹์ง€ ์•Š์•˜๋‹ค. ํ† ํŠธ๋„˜์€ 23์ผ(์ดํ•˜ ํ•œ๊ตญ์‹œ๊ฐ) ์˜ค์ „ ์˜๊ตญ ๋Ÿฐ๋˜์— ์œ„์น˜ํ•œ ํ† ํŠธ๋„˜ ํ™‹์Šคํผ ์Šคํƒ€๋””์›€์—์„œ ์—ด๋ฆฐ ๋…ธ๋ฆฌ์น˜์‹œํ‹ฐ์™€์˜ ์ž‰๊ธ€๋ฆฌ์‹œ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ 24๋ผ์šด๋“œ ํ™ˆ๊ฒฝ๊ธฐ์—์„œ 2-1๋กœ ์Šน๋ฆฌํ–ˆ๋‹ค. ์ด๋‚  ํ† ํŠธ๋„˜์€ ์ „๋ฐ˜ 38๋ถ„ ๋ธ๋ฆฌ ์•Œ๋ฆฌ๊ฐ€ ์„ ์ œ๊ณจ์„ ๋„ฃ์€ ๋’ค ํ›„๋ฐ˜ 25๋ถ„ ํ…Œ๋ฌด ํ‘ธํ‚ค์—๊ฒŒ ํŽ˜๋„ํ‹ฐํ‚ฅ ๊ณจ์„ ํ—ˆ์šฉํ•ด ๋™์ ์„ ๋‚ด์คฌ๋‹ค. ์ดํ›„ ํ† ํŠธ๋„˜์€ ํ›„๋ฐ˜ 34๋ถ„ ์†ํฅ๋ฏผ์ด ๊ท ํ˜•์„ ๊นจ๋Š” ํ—ค๋”๊ณจ์„ ํ„ฐ๋œจ๋ ธ๊ณ , ๊ฒฐ๊ตญ 2-1 ์Šน๋ฆฌ๋ฅผ ๊ฑฐ๋’€๋‹ค. ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ 8์œ„์—์„œ 6์œ„๋กœ ์˜ฌ๋ผ์„ฐ๋‹ค. ํ•˜์ง€๋งŒ ๊ฒฐ์Šน๊ณจ์˜ ์ฃผ์ธ๊ณต ์†ํฅ๋ฏผ์€ ๋†’์€ ํ‰์ ์„ ๋ฐ›์ง€ ๋ชปํ–ˆ๋‹ค. ์˜๊ตญ ์ถ•๊ตฌ ํ†ต๊ณ„ ์ „๋ฌธ์‚ฌ์ดํŠธ ํ›„์Šค์ฝ”์–ด๋“œ๋‹ท์ปด์€ ์†ํฅ๋ฏผ์—๊ฒŒ ๋น„๊ต์  ๋‚ฎ์€ ํ‰์ ์ธ 6.8์ ์„ ๋ถ€์—ฌํ–ˆ๋‹ค. ์†ํฅ๋ฏผ์€ ์•Œ๋ฆฌ์˜ ์„ ์ œ๊ณจ์˜ ๊ธฐ์  ์—ญํ• ์„ ํ–ˆ๊ณ , ๊ฒฐ์Šน๊ณจ์„ ๋„ฃ์—ˆ์œผ๋‚˜ ๋‹ค๋ฅธ ์žฅ๋ฉด์—์„œ๋Š” ์ด๋ ‡๋‹ค ํ•  ๋ชจ์Šต์„ ๋ณด์ด์ง€ ๋ชปํ–ˆ๋‹ค. ํ† ํŠธ๋„˜์—์„œ๋Š” ์˜ค๋ฆฌ์—๊ฐ€ 8์ ์œผ๋กœ ๊ฐ€์žฅ ๋†’์•˜๊ณ , ๋กœ ์…€์†Œ๊ฐ€ 7.9์  ๊ทธ๋ฆฌ๊ณ  ์•Œ๋ฆฌ๊ฐ€ 7.6์ ์œผ๋กœ ๋’ค๋ฅผ ์ด์—ˆ๋‹ค. [๋™์•„๋‹ท์ปด, ์กฐ์„ฑ์šด ๊ธฐ์ž, 2020.1.23., ๋ณธ๋ฌธ๋ณด๊ธฐ]

      Python Code (in Jupyter Notebook) :

      #document_path = "article.txt"
      #ds.set_document(document_path)
      result = ds.summarize("textrank", 3)
      print(result)
      

      Output (in Jupyter Notebook) :

      ['โ€˜์†์„ธ์ด์…”๋„โ€™ ์†ํฅ๋ฏผ(28, ํ† ํŠธ๋„˜ ํ™‹์Šคํผ)์ด ํŒ€ ์Šน๋ฆฌ์˜ ๊ฒฐ์Šน๊ณจ์„ ๋„ฃ์—ˆ์œผ๋‚˜ ๋†’์€ ํ‰์ ์„ ๋ฐ›์ง€ ๋ชปํ–ˆ๋‹ค.',
       'ํ† ํŠธ๋„˜์€ 23์ผ(์ดํ•˜ ํ•œ๊ตญ์‹œ๊ฐ) ์˜ค์ „ ์˜๊ตญ ๋Ÿฐ๋˜์— ์œ„์น˜ํ•œ ํ† ํŠธ๋„˜ ํ™‹์Šคํผ ์Šคํƒ€๋””์›€์—์„œ ์—ด๋ฆฐ ๋…ธ๋ฆฌ์น˜์‹œํ‹ฐ์™€์˜ ์ž‰๊ธ€๋ฆฌ์‹œ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ 24๋ผ์šด๋“œ ํ™ˆ๊ฒฝ๊ธฐ์—์„œ 2-1๋กœ ์Šน๋ฆฌํ–ˆ๋‹ค.',
       '์ด๋‚  ํ† ํŠธ๋„˜์€ ์ „๋ฐ˜ 38๋ถ„ ๋ธ๋ฆฌ ์•Œ๋ฆฌ๊ฐ€ ์„ ์ œ๊ณจ์„ ๋„ฃ์€ ๋’ค ํ›„๋ฐ˜ 25๋ถ„ ํ…Œ๋ฌด ํ‘ธํ‚ค์—๊ฒŒ ํŽ˜๋„ํ‹ฐํ‚ฅ ๊ณจ์„ ํ—ˆ์šฉํ•ด ๋™์ ์„ ๋‚ด์คฌ๋‹ค.'
      ]
      

      Python Code (in Jupyter Notebook) :

      #document_path = "article.txt"
      #ds.set_document(document_path)
      result = ds.summarize("lsa", 3)
      print(result)
      

      Output (in Jupyter Notebook) :

      ['ํ† ํŠธ๋„˜์€ 23์ผ(์ดํ•˜ ํ•œ๊ตญ์‹œ๊ฐ) ์˜ค์ „ ์˜๊ตญ ๋Ÿฐ๋˜์— ์œ„์น˜ํ•œ ํ† ํŠธ๋„˜ ํ™‹์Šคํผ ์Šคํƒ€๋””์›€์—์„œ ์—ด๋ฆฐ ๋…ธ๋ฆฌ์น˜์‹œํ‹ฐ์™€์˜ ์ž‰๊ธ€๋ฆฌ์‹œ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ 24๋ผ์šด๋“œ ํ™ˆ๊ฒฝ๊ธฐ์—์„œ 2-1๋กœ ์Šน๋ฆฌํ–ˆ๋‹ค.',
       '์ด๋‚  ํ† ํŠธ๋„˜์€ ์ „๋ฐ˜ 38๋ถ„ ๋ธ๋ฆฌ ์•Œ๋ฆฌ๊ฐ€ ์„ ์ œ๊ณจ์„ ๋„ฃ์€ ๋’ค ํ›„๋ฐ˜ 25๋ถ„ ํ…Œ๋ฌด ํ‘ธํ‚ค์—๊ฒŒ ํŽ˜๋„ํ‹ฐํ‚ฅ ๊ณจ์„ ํ—ˆ์šฉํ•ด ๋™์ ์„ ๋‚ด์คฌ๋‹ค.',
       '์†ํฅ๋ฏผ์€ ์•Œ๋ฆฌ์˜ ์„ ์ œ๊ณจ์˜ ๊ธฐ์  ์—ญํ• ์„ ํ–ˆ๊ณ , ๊ฒฐ์Šน๊ณจ์„ ๋„ฃ์—ˆ์œผ๋‚˜ ๋‹ค๋ฅธ ์žฅ๋ฉด์—์„œ๋Š” ์ด๋ ‡๋‹ค ํ•  ๋ชจ์Šต์„ ๋ณด์ด์ง€ ๋ชปํ–ˆ๋‹ค.'
      ]
      
3.7. teanaps.text_analysis.KeywordExtractor

Python Code (in Jupyter Notebook) :

from teanaps.text_analysis import KeywordExtractor

ke = KeywordExtractor(model_path="/model")

Notes :

  • ๋ชจ๋ธ ํŒŒ์ผ์„ ๋ณ„๋„๋กœ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ํŒŒ์ผ ๊ฒฝ๋กœ๋ฅผ model_path ๋ณ€์ˆ˜์— ํฌํ•จํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
  • import์‹œ ์ตœ์ดˆ 1ํšŒ ๊ฒฝ๊ณ ๋ฉ”์‹œ์ง€ (Warnning)๊ฐ€ ์ถœ๋ ฅ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌด์‹œํ•˜์…”๋„ ์ข‹์Šต๋‹ˆ๋‹ค.
  • teanaps.text_analysis.KeywordExtractor.parse(sentence, max_keyword=5) Top(https://github.com/fingeredman/teanaps/wiki/TEXT-ANALYSIS#teanaps-api-documentation)
    • ๋ฌธ์žฅ์—์„œ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ณ  ๊ทธ ๊ฐ€์ค‘์น˜๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    • Parameters

      • sentence (str) : ํ•œ๊ตญ์–ด ๋˜๋Š” ์˜์–ด๋กœ ๊ตฌ์„ฑ๋œ ๋ฌธ์žฅ. ์ตœ๋Œ€ 128์ž.
      • max_keyword (int) : ์ถ”์ถœํ•  ์ตœ๋Œ€ ํ‚ค์›Œ๋“œ ๊ฐœ์ˆ˜.
    • Returns

      • result (list) : (ํ‚ค์›Œ๋“œ, ๊ฐ€์ค‘์น˜, ํ‚ค์›Œ๋“œ ์œ„์น˜) ๊ตฌ์กฐ์˜ Tuple์„ ํฌํ•จํ•˜๋Š” ๋ฆฌ์ŠคํŠธ.
    • Examples

      Python Code (in Jupyter Notebook) :

      sentence = "์œ ํ”Œ๋Ÿฌ์Šค๋Š” ํ†ต์‹ 3์‚ฌ(SKT, LGU+, KT) ์ค‘์— 5G ์š”๊ธˆ์ œ๋ฅผ ์ตœ์ดˆ๋กœ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค."
      result = ke.parse(sentence)
      print(result)
      

      Output (in Jupyter Notebook) :

      [('LGU+', 1.33617, (16, 20)), ('SKT', 0.81265, (11, 14)), ('KT', 0.79936, (22, 24)), ('5G', 0.74944, (29, 31)), ('์œ ํ”Œ๋Ÿฌ์Šค', 0.37639, (0, 4))]