ICP7 - PallaviArikatla/Python GitHub Wiki

OBJECTIVE

To understand and implement NLP and NLTK features.

PyCharm, Python 3 / Python 2.

a. Calculate accuracy score using SVC model

b. Changing the tfidfvectorizer to bigram and set the range to (1,2).

The vector will be transformed to range (1,2)
Create MultinominalNB model and fit this data into that model.
Now find the predicated score on the test data followed by calculating accuracy score.

c. Setargument stop_words='english' and calculate the score

Output scores are as follows:

The text file will be created as follows:

Apply word and sentence tokenizations on the input.txt file.

Apply three types of stemming: PorterStemmer, LancasterStemmer, SnowballStemmer.

The output will be as follows:

Apply POS method after tokenization. Later apply lemmatization on tokens using wordnet.

Applies ngram with count 3 on the tokens.

Applying name entity recognition locate and classifies these entities into text.