Topic Detection - SubhasisDutta/Text-Analysis GitHub Wiki
The Topic detection algorithm helps in categorizing a given text into one of the 22 trained categories mentioned below
Advertising, Beauty, Business, Celebrity, Diy craft, Entertainment, Family, Fashion, Food, General, Health, Lifestyle, Music, News, Pop, Culture, Social, Media, Sports, Technology, Travel, Video Games.
###Packages/Algorithms used
- Word2Vec vectors trained on google news corpus
- Gentsim - To read the binary Word2Vec vectors
- Twokenize - To extract Text and emoticons from Twitter
- RAKE - Keyword extraction
- Scikit-Learn - Kmeans clustering of Word2Vec vectors across the above mentioned categories