Module 1_ICP 7: Natural Language Processing in Python using NLTK - acikgozmehmet/PythonDeepLearning GitHub Wiki
#Natural Language Processing in Python using NLTK
Objectives:
The following topics are covered.
- NLP (Natural language processing)
- NLTK (Natural Language Toolkit)
Overview
NLP (Natural language processing)
- Computer aided text analysis of human language
- The goal is to enable machines to understand human language and extract meaning from text
- The “Natural Language Toolkit” is a python module that provides a variety of functionality that will aid us in processing text
NLTK (Natural Language Toolkit)
- An open source library which simplifies the implementation of Natural Language Processing(NLP) in Python.
- Text processing like unigram, bigram, trigram, tokenization, pos tagging, lemmatization, normalization, entity extraction, language model.
- Learning these features will help us for more meaningful project as document classification, spelling corrector, document summarization, etc
In Class Programming
1. Change the classifier in the given code to
a. SVM and see how accuracy changes b. change the tfidf vectorizer to use bigram and see how the accuracy changes TfidfVectorizer(ngram_range=(1,2)) c. Set argument stop_words='english' and see how accuracy changes
Click here to get the source code
2. Extract the following web URL text using BeautifulSoup
3. Save it in input.txt
https://en.wikipedia.org/wiki/Google
Click here to get the source code
4. Apply the following on the text and show output:
- a. Tokenization
- b. POS
- c. Stemming
- d. Lemmatization
- e. Trigram
- f. Named Entity Recognition
Click here to get the source code
References
https://github.com/wade12/WikiScraper/blob/master/