Text preprocessing in Topic Modeling - SoojungHong/MachineLearning GitHub Wiki

Simple preprocessing techniques before building a document-term matrix

  • Minimum-term length
  • Case conversion
  • Stop-word filtering
  • Minimum frequency filtering
  • Maximum frequency filtering
  • Stemming

Reference >> http://derekgreene.com/slides/topic-modelling-with-scikitlearn.pdf