LabAssignment1Report - aaz000966/CS5542 GitHub Wiki

Lab Assignment #1 Report Zakari, Abdulmuhaymin (29)

This is a lab assignment report for performing NLP and sift techniques on a dataset

• NLP: I used NLTK, an open source python NLP library that has great potentials. I tried to use spacy since it’s provisioned as much faster library, but I faced several issues, the following code contains comment on the workflow.

  1. #This code was inspired direclty by the ICPs given in the class
  2. #i'll be using
  3. import nltk
  4. from nltk.corpus import stopwords
  5. from nltk.stem import WordNetLemmatizer
  6. #I tried to use spacy, but I faced some issues
  7. import spacy
  8. from nltk.tokenize import word_tokenize
  9. #to count how many words and lines in the file, helped during the development
  10. Words=0
  11. Lines=0
  12. #linking the dataset from the project folder and read the file
  13. file= open("SBU_captioned_photo_dataset_captions.txt","r")
  14. data=file.read()
  15. #simple iteration to count the content
  16. with file as f1:
  17. for line in f1:
    
  18.     Lines += 1
    
  19.     for word in line.split():
    
  20.         Words += 1
    
  21. print(Words)
  22. print(Lines)
  23. #to creat a newfile with the output required
  24. TokenF=open("Tokenized.txt", "w")
  25. #tokenizing the edetatset using word_tokenize(str), a predefined nltk method
  26. TokenF.write(str(word_tokenize(data)))
  27. #creating a new output file for lemma operation
  28. LemmaF=open("Lemmatized.txt", "w")
  29. lemma = WordNetLemmatizer()
  30. file= open("SBU_captioned_photo_dataset_captions.txt","r")
  31. #for the sake of demonstration, I'm only writng the words with different architecture
  32. with file as f2:
  33.  for line in f2:
    
  34.      for word in line.split():
    
  35.          if word != lemma.lemmatize(word):
    
  36.              LemmaF.write(str(word + " : " + lemma.lemmatize(word) + "\n"))
    
  37. print("done")

As it’s shown, the first part of this program produces a text file with the dataset segmented or tokenized:

The second part produces a file of the lemmatized version of the dataset:

• Sift:

The other part of the assignment is to sample some images and process them through sift. I used 2 images that reflect my project theme (food) and take benefits of the sift image processing approach. The code as follow:

  1. #This code was inspired direclty by the ICPs given in the class
  2. #Importing cv2 ,matplotlib
  3. import cv2
  4. import matplotlib.pyplot as MyPyplt
  5. #Inserting photos to the system, sources: https://sakatavegetables.com/specialty-tomatoes/
  6. #as well as converting them to CV2 RBG
  7. MyImage1 = cv2.imread('1.jpg')
  8. MyImage1 = cv2.cvtColor(MyImage1, cv2.COLOR_BGR2RGB)
  9. MyImage2 = cv2.imread('2.jpg')
  10. MyImage2 = cv2.cvtColor(MyImage2, cv2.COLOR_BGR2RGB)
  11. #As descriped that this line is required for version-3
  12. orb = cv2.ORB_create()
  13. #detecting and computing keypoints on both images, des1,2 will be used later on.
  14. ImgKP, des1 = orb.detectAndCompute(MyImage1, None)
  15. ImgKP2, des2 = orb.detectAndCompute(MyImage2, None)
  16. #new 2 images with the keypoints displayed
  17. Img1AfrPrs = cv2.drawKeypoints(MyImage1,ImgKP,None) # Draw circles.
  18. Img2AfrPrs = cv2.drawKeypoints(MyImage2,ImgKP2,None) # Draw circles.
  19. MyPyplt.imshow(Img1AfrPrs)
  20. MyPyplt.show()
  21. MyPyplt.imshow(Img2AfrPrs)
  22. MyPyplt.show()
  23. bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
  24. matches = bf.match(des1, des2)
  25. matches = sorted(matches, key = lambda x:x.distance)
  26. #creating a new image with matches on both and connected using drawMatches()
  27. img_matches = cv2.drawMatches(MyImage1, ImgKP, MyImage2, ImgKP2, matches[:50], MyImage2, flags=2)
  28. MyPyplt.imshow(img_matches);
  29. MyPyplt.show()

Output:

Thank you.