ICP 7 - Saiaishwaryapuppala/CSEE5590_python_Icp GitHub Wiki

Python and Deep Learning: Special Topics

Rajeshwari Sai Aishwarya Puppala

Student ID: 16298162

Class ID: 35

In class programming: 6

Objectives:

  1. Change the classifier in the given code to a. SVMand see how accuracy changes

b. change the tfidfvectorizer to use bigram and see how the accuracy changes
TfidfVectorizer(ngram_range=(1,2))

c. Setargument stop_words='english'and see how accuracy changes

  1. Extract the following web URL text using BeautifulSouphttps://en.wikipedia.org/wiki/Google

  2. Save it in input.txt

4.Apply the following on the “input.txt”and show output:

a. Tokenization

b. POS

c. Stemming

d. Lemmatization

e. Trigram

f. Named Entity Recognition

Scraping

Code

  • Import the required packages beautiful soup and requests
  • Request the URL
  • Parse the webpage with the html parser
  • Extract the text by specifying the class mw-parser-output

  • Write the text file in the input.txt

Output

NLTK

Code

  • Import the Packages word_tokenize,sent_tokenize, WordNetLemmatizer, PorterStemmer, ne_chunk
  • Before importing download the necessary packages from nltk
  • Take the text from which has been extracted with the scrapping code
  • Do the word tokenization and sentence tokenization
  • Now give the tokens as the input and find the trigrams in it by specifying the no. of grams = 0
  • Give the word tokens as an input and perform the Lemmatization and Stemming on it.
  • Now find the POS and NOR by giving the word tokens as an input

Output

Tokens and Trigram

**** Lemma, Stemming, POS and NOR****

TDIDF

Code

  • Import the necessary packages required
  • Fetch the train and test data set of 20 newsgroups from the sci-kit learn
  • Convert the collection of raw documents from train dataset to a matrix of TF-IDF features with the help of TfidfVectorizer
  • Convert it normally, with bigram and stop words with "English".
  • Initialize the Knn classifier
  • After converting fit the train data with the Knn Classifier
  • Now predict the values on the test data which is present in the 20 newsgroups
  • Calculate the accuracy scores with the true test data and the predicted values.
  • Repeat the process with other 2 vectors and check the accuracy which are better.

Output