ICP_7 - acvc279/Python_Deeplearning GitHub Wiki

VIDEO LINK: https://drive.google.com/file/d/1CamJP3AyKgrbAm1oGBj4jaO9KSYsW0-h/view?usp=drivesdk

Q1. Change the classifier in the given source code to

a. SVMand see how accuracy changes

b. Setthe tfidf vectorizer parameter to use bigram and see how the accuracy changes TfidfVectorizer(ngram_range=(1,2))

c. Settfidf vectorizer argument to use stop_words='english'and see how accuracy changes

First import all the required packeges, then get the twentytrain to do vectorizer by doing this: then set the vectorrizer with given range and also declare anther with given argument: Undergone to fit model to find the better accuracy for training: Then find the MultinomialNB accuracy,MultinomialNB accuracy on bigram and MultinomialNB accuracy when adding the stopwords. After doing these we find SVM and seen how the accuraccy changes:

Q2 Extract the following web URL text using BeautifulSoupand save the result in a file “input.txt”.

Impored all the required libraries then extract the web URL in a function: Then create a file and append all the data in to it.

Apply the following on the “input.txt” file: •Tokenization •POS •Stemming •Lemmatization •Trigram •Named Entity Recognition.

Import Natual language toolkit then read the extracted file: Implemmented word streaming and scentence streaming: Implemented streaming(converts the word in to a base form): Implementing POS And Lemmatization(Converts the word in to a meaningful base form): Implenting Trigram(Sequence of words): Implementing Named entity recognization(classifies the data in to catagories): Here the output: Learned from these ICP:Natural language toolkit