LAB1 - Hiresh12/UMKC GitHub Wiki

BIG DATA ANALYTICS AND APPLICATIONS

Team 3 :SINEW

Team Members :

Hiresh Jakkala Bhaskar – 8 Anvesh Mandadi – 17 Sai Sampath Kumar - 22 Veeresh Thotigar – 26

Objective :

To collect dataset related to the project theme and extract the captions for each image in a file and perform NLP (Tokenization and Lemmatization) on the captions and store it in a file and filter images based on the project theme and store it in separate file. And to perform SIFT feature extraction on the images.

Project Theme : Indoor Activities

Objects to be detected from Images : Kitchen, Door, Television.

Code Implementation :

Below code is to read captions from a file and perform Tokenization and Lemmatization.

Below Image shows sample output of the captions after tokenization and lemmatization are applied.

Below are the images filtered from the dataset related to the project theme

Doors:

Kitchen:

Television:

Image Statistics:

Scale Invariance Feature Transform :

SIFT applied on both the images of different scale and common features are identified and marked by green line Image after SIFT Extraction and Circle denotes a feature and size of the circle denotes scale (sigma value)

Youtube Video:

https://www.youtube.com/watch?v=KxiIYH9fjQU&feature=youtu.be