Topic Detection - SubhasisDutta/Text-Analysis GitHub Wiki

The Topic detection algorithm helps in categorizing a given text into one of the 22 trained categories mentioned below

Advertising, Beauty, Business, Celebrity, Diy craft, Entertainment, Family, Fashion, Food, General, Health, Lifestyle, Music, News, Pop, Culture, Social, Media, Sports, Technology, Travel, Video Games.

###Packages/Algorithms used

  • Word2Vec vectors trained on google news corpus
  • Gentsim - To read the binary Word2Vec vectors
  • Twokenize - To extract Text and emoticons from Twitter
  • RAKE - Keyword extraction
  • Scikit-Learn - Kmeans clustering of Word2Vec vectors across the above mentioned categories