ICP_7 - acvc279/Python_Deeplearning GitHub Wiki
https://drive.google.com/file/d/1CamJP3AyKgrbAm1oGBj4jaO9KSYsW0-h/view?usp=drivesdk
VIDEO LINK:Q1. Change the classifier in the given source code to
a. SVMand see how accuracy changes
b. Setthe tfidf vectorizer parameter to use bigram and see how the accuracy changes TfidfVectorizer(ngram_range=(1,2))
c. Settfidf vectorizer argument to use stop_words='english'and see how accuracy changes
First import all the required packeges, then get the twentytrain to do vectorizer by doing this: then set the vectorrizer with given range and also declare anther with given argument: Undergone to fit model to find the better accuracy for training: Then find the MultinomialNB accuracy,MultinomialNB accuracy on bigram and MultinomialNB accuracy when adding the stopwords. After doing these we find SVM and seen how the accuraccy changes:
Q2 Extract the following web URL text using BeautifulSoupand save the result in a file “input.txt”.
Impored all the required libraries then extract the web URL in a function: Then create a file and append all the data in to it.
Apply the following on the “input.txt” file: •Tokenization •POS •Stemming •Lemmatization •Trigram •Named Entity Recognition.
Import Natual language toolkit then read the extracted file: Implemmented word streaming and scentence streaming: Implemented streaming(converts the word in to a base form): Implementing POS And Lemmatization(Converts the word in to a meaningful base form): Implenting Trigram(Sequence of words): Implementing Named entity recognization(classifies the data in to catagories): Here the output: Learned from these ICP:Natural language toolkit