ICP 7 - Saiaishwaryapuppala/CSEE5590_python_Icp GitHub Wiki
Python and Deep Learning: Special Topics
Rajeshwari Sai Aishwarya Puppala
Student ID: 16298162
Class ID: 35
In class programming: 6
Objectives:
- Change the classifier in the given code to a. SVMand see how accuracy changes
b. change the tfidfvectorizer to use bigram and see how the accuracy changes
TfidfVectorizer(ngram_range=(1,2))
c. Setargument stop_words='english'and see how accuracy changes
-
Extract the following web URL text using BeautifulSouphttps://en.wikipedia.org/wiki/Google
-
Save it in input.txt
4.Apply the following on the “input.txt”and show output:
a. Tokenization
b. POS
c. Stemming
d. Lemmatization
e. Trigram
f. Named Entity Recognition
Scraping
Code
- Import the required packages beautiful soup and requests
- Request the URL
- Parse the webpage with the html parser
- Extract the text by specifying the class mw-parser-output
- Write the text file in the input.txt
Output
NLTK
Code
- Import the Packages word_tokenize,sent_tokenize, WordNetLemmatizer, PorterStemmer, ne_chunk
- Before importing download the necessary packages from nltk
- Take the text from which has been extracted with the scrapping code
- Do the word tokenization and sentence tokenization
- Now give the tokens as the input and find the trigrams in it by specifying the no. of grams = 0
- Give the word tokens as an input and perform the Lemmatization and Stemming on it.
- Now find the POS and NOR by giving the word tokens as an input
Output
Tokens and Trigram
**** Lemma, Stemming, POS and NOR****
TDIDF
Code
- Import the necessary packages required
- Fetch the train and test data set of 20 newsgroups from the sci-kit learn
- Convert the collection of raw documents from train dataset to a matrix of TF-IDF features with the help of TfidfVectorizer
- Convert it normally, with bigram and stop words with "English".
- Initialize the Knn classifier
- After converting fit the train data with the Knn Classifier
- Now predict the values on the test data which is present in the 20 newsgroups
- Calculate the accuracy scores with the true test data and the predicted values.
- Repeat the process with other 2 vectors and check the accuracy which are better.