ICP 5 - smqhw/kdm1 GitHub Wiki

A. What have you learned from the ICP

In this i have created the 5 text files and mounted in the google drive taking these as inputs i have performed all the tasks given in the ICP 1.TF-IDF:TF-IDF is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. This is done by multiplying two metrics: how many times a word appears in a document, and the inverse document frequency of the word across a set of documents.

2.Lemmatiztion:to sort (the words in a list or text) in order to determine the headword, under which other words are then listed.

3.Ngram:In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus.

B. Main Objective TF-IDF: The overall goal of TF-IDF is to statistically measure how important a word is in a collection of documents. It's like a really useful keyword density tool on steroids. It gets less complicated when we break it down. Lemmatization: Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma . N-gram: Basically, an N-gram model predicts the occurrence of a word based on the occurrence of its N – 1 previous words. So here we are answering the question – how far back in the history of a sequence of words should we go to predict the next word?

C. Design / Implementation: file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(46).png this for creating the text files and mounted them in the google drive file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(47).png This is for tf idf file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(48).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(49).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(50).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(51).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(52).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(53).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(54).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(55).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(56).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(57).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(58).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(59).png file:///C:/Users/sai/Pictures/Screenshots/Screenshot%20(60).png

D. Video provided in the code session