ICP_7 - acvc279/Python_Deeplearning GitHub Wiki
https://drive.google.com/file/d/1CamJP3AyKgrbAm1oGBj4jaO9KSYsW0-h/view?usp=drivesdk
VIDEO LINK:Q1. Change the classifier in the given source code to
a. SVMand see how accuracy changes
b. Setthe tfidf vectorizer parameter to use bigram and see how the accuracy changes TfidfVectorizer(ngram_range=(1,2))
c. Settfidf vectorizer argument to use stop_words='english'and see how accuracy changes
First import all the required packeges, then get the twentytrain to do vectorizer by doing this:
then set the vectorrizer with given range and also declare anther with given argument:
Undergone to fit model to find the better accuracy for training:
Then find the MultinomialNB accuracy,MultinomialNB accuracy on bigram and MultinomialNB accuracy when adding the stopwords.
After doing these we find SVM and seen how the accuraccy changes:
Q2 Extract the following web URL text using BeautifulSoupand save the result in a file “input.txt”.
Impored all the required libraries then extract the web URL in a function:
Then create a file and append all the data in to it.
Apply the following on the “input.txt” file: •Tokenization •POS •Stemming •Lemmatization •Trigram •Named Entity Recognition.
Import Natual language toolkit then read the extracted file:
Implemmented word streaming and scentence streaming:
Implemented streaming(converts the word in to a base form):
Implementing POS And Lemmatization(Converts the word in to a meaningful base form):
Implenting Trigram(Sequence of words):
Implementing Named entity recognization(classifies the data in to catagories):
Here the output:
Learned from these ICP:Natural language toolkit