ICP 7 - ntihindukkipati/CS5590_Python_DL GitHub Wiki

ICP 7

  1. Extract the following web URL text using BeautifulSoup
    https://en.wikipedia.org/wiki/Google

  2. Save it in input.txt3. Apply the following on the text and show output:
    a. Tokenization
    b. POS
    c. Stemming
    d. Lemmatization
    e. Trigram
    f. Named Entity Recognition
    Screenshot (474) Screenshot (475) Screenshot (476) Screenshot (477) Screenshot (478)

  3. Change the classifier in the given code to:
    a. KNeighborsClassifier and see how accuracy changes
    b. change the TF-IDF vectorizer to use bigram and see how the accuracy changes TfidfVectorizer(ngram_range=(1,2))
    c. Put argument stop_words='english'and see how accuracy changes
    Screenshot (472) Screenshot (473)


BY
DUKKIPATI SRI SAI NITHIN CHOWDARY
CLASS ID: 4
⚠️ **GitHub.com Fallback** ⚠️