LabAssignment1Report - aaz000966/CS5542 GitHub Wiki

Lab Assignment #1 Report Zakari, Abdulmuhaymin (29)

This is a lab assignment report for performing NLP and sift techniques on a dataset

• NLP: I used NLTK, an open source python NLP library that has great potentials. I tried to use spacy since it’s provisioned as much faster library, but I faced several issues, the following code contains comment on the workflow.

#This code was inspired direclty by the ICPs given in the class
#i'll be using
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
#I tried to use spacy, but I faced some issues
import spacy
from nltk.tokenize import word_tokenize
#to count how many words and lines in the file, helped during the development
Words=0
Lines=0
#linking the dataset from the project folder and read the file
file= open("SBU_captioned_photo_dataset_captions.txt","r")
data=file.read()
#simple iteration to count the content
with file as f1:
```
for line in f1:
```
```
    Lines += 1
```
```
    for word in line.split():
```
```
        Words += 1
```
print(Words)
print(Lines)
#to creat a newfile with the output required
TokenF=open("Tokenized.txt", "w")
#tokenizing the edetatset using word_tokenize(str), a predefined nltk method
TokenF.write(str(word_tokenize(data)))
#creating a new output file for lemma operation
LemmaF=open("Lemmatized.txt", "w")
lemma = WordNetLemmatizer()
file= open("SBU_captioned_photo_dataset_captions.txt","r")
#for the sake of demonstration, I'm only writng the words with different architecture
with file as f2:
```
 for line in f2:
```
```
     for word in line.split():
```

         if word != lemma.lemmatize(word):

             LemmaF.write(str(word + " : " + lemma.lemmatize(word) + "\n"))

print("done")

As it’s shown, the first part of this program produces a text file with the dataset segmented or tokenized:

The second part produces a file of the lemmatized version of the dataset:

• Sift:

The other part of the assignment is to sample some images and process them through sift. I used 2 images that reflect my project theme (food) and take benefits of the sift image processing approach. The code as follow:

#This code was inspired direclty by the ICPs given in the class
#Importing cv2 ,matplotlib
import cv2
import matplotlib.pyplot as MyPyplt
#Inserting photos to the system, sources: https://sakatavegetables.com/specialty-tomatoes/
#as well as converting them to CV2 RBG
MyImage1 = cv2.imread('1.jpg')
MyImage1 = cv2.cvtColor(MyImage1, cv2.COLOR_BGR2RGB)
MyImage2 = cv2.imread('2.jpg')
MyImage2 = cv2.cvtColor(MyImage2, cv2.COLOR_BGR2RGB)
#As descriped that this line is required for version-3
orb = cv2.ORB_create()
#detecting and computing keypoints on both images, des1,2 will be used later on.
ImgKP, des1 = orb.detectAndCompute(MyImage1, None)
ImgKP2, des2 = orb.detectAndCompute(MyImage2, None)
#new 2 images with the keypoints displayed
Img1AfrPrs = cv2.drawKeypoints(MyImage1,ImgKP,None) # Draw circles.
Img2AfrPrs = cv2.drawKeypoints(MyImage2,ImgKP2,None) # Draw circles.
MyPyplt.imshow(Img1AfrPrs)
MyPyplt.show()
MyPyplt.imshow(Img2AfrPrs)
MyPyplt.show()
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)
matches = sorted(matches, key = lambda x:x.distance)
#creating a new image with matches on both and connected using drawMatches()
img_matches = cv2.drawMatches(MyImage1, ImgKP, MyImage2, ImgKP2, matches[:50], MyImage2, flags=2)
MyPyplt.imshow(img_matches);
MyPyplt.show()

Output:

Thank you.