LabAssignment1Report - aaz000966/CS5542 GitHub Wiki
Lab Assignment #1 Report Zakari, Abdulmuhaymin (29)
This is a lab assignment report for performing NLP and sift techniques on a dataset
• NLP: I used NLTK, an open source python NLP library that has great potentials. I tried to use spacy since it’s provisioned as much faster library, but I faced several issues, the following code contains comment on the workflow.
- #This code was inspired direclty by the ICPs given in the class
- #i'll be using
- import nltk
- from nltk.corpus import stopwords
- from nltk.stem import WordNetLemmatizer
- #I tried to use spacy, but I faced some issues
- import spacy
- from nltk.tokenize import word_tokenize
- #to count how many words and lines in the file, helped during the development
- Words=0
- Lines=0
- #linking the dataset from the project folder and read the file
- file= open("SBU_captioned_photo_dataset_captions.txt","r")
- data=file.read()
- #simple iteration to count the content
- with file as f1:
-
for line in f1:
-
Lines += 1
-
for word in line.split():
-
Words += 1
- print(Words)
- print(Lines)
- #to creat a newfile with the output required
- TokenF=open("Tokenized.txt", "w")
- #tokenizing the edetatset using word_tokenize(str), a predefined nltk method
- TokenF.write(str(word_tokenize(data)))
- #creating a new output file for lemma operation
- LemmaF=open("Lemmatized.txt", "w")
- lemma = WordNetLemmatizer()
- file= open("SBU_captioned_photo_dataset_captions.txt","r")
- #for the sake of demonstration, I'm only writng the words with different architecture
- with file as f2:
-
for line in f2:
-
for word in line.split():
-
if word != lemma.lemmatize(word):
-
LemmaF.write(str(word + " : " + lemma.lemmatize(word) + "\n"))
- print("done")
As it’s shown, the first part of this program produces a text file with the dataset segmented or tokenized:
The second part produces a file of the lemmatized version of the dataset:
• Sift:
The other part of the assignment is to sample some images and process them through sift. I used 2 images that reflect my project theme (food) and take benefits of the sift image processing approach. The code as follow:
- #This code was inspired direclty by the ICPs given in the class
- #Importing cv2 ,matplotlib
- import cv2
- import matplotlib.pyplot as MyPyplt
- #Inserting photos to the system, sources: https://sakatavegetables.com/specialty-tomatoes/
- #as well as converting them to CV2 RBG
- MyImage1 = cv2.imread('1.jpg')
- MyImage1 = cv2.cvtColor(MyImage1, cv2.COLOR_BGR2RGB)
- MyImage2 = cv2.imread('2.jpg')
- MyImage2 = cv2.cvtColor(MyImage2, cv2.COLOR_BGR2RGB)
- #As descriped that this line is required for version-3
- orb = cv2.ORB_create()
- #detecting and computing keypoints on both images, des1,2 will be used later on.
- ImgKP, des1 = orb.detectAndCompute(MyImage1, None)
- ImgKP2, des2 = orb.detectAndCompute(MyImage2, None)
- #new 2 images with the keypoints displayed
- Img1AfrPrs = cv2.drawKeypoints(MyImage1,ImgKP,None) # Draw circles.
- Img2AfrPrs = cv2.drawKeypoints(MyImage2,ImgKP2,None) # Draw circles.
- MyPyplt.imshow(Img1AfrPrs)
- MyPyplt.show()
- MyPyplt.imshow(Img2AfrPrs)
- MyPyplt.show()
- bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
- matches = bf.match(des1, des2)
- matches = sorted(matches, key = lambda x:x.distance)
- #creating a new image with matches on both and connected using drawMatches()
- img_matches = cv2.drawMatches(MyImage1, ImgKP, MyImage2, ImgKP2, matches[:50], MyImage2, flags=2)
- MyPyplt.imshow(img_matches);
- MyPyplt.show()
Output:
Thank you.