LAB 3 - Hiresh12/UMKC GitHub Wiki

                                     # BIG DATA ANALYTICS AND APPLICATIONS – LAB 3

Hiresh Jakkala Bhaskar – 8

Anvesh Mandadi – 17

AIM :

Part -1:

• To Develop a show and model using the dataset o I used images that has objects like table, door, kitchen, jacket o This dataset is used to train the model so that we can identify the indoor activities described in the image o This is help us in track the activities of children and pets in home • To make the model to tell caption for the given test images • To compute the performance metrics of the model using BLEU.

Part -2:

• Analyze the dataset using machine learning (unsupervised)

Step Followed:

Part 1:

• Pre-processing the dataset to generate lemma for the captions, splitting the dataset into train and test data. • Extract the features from the train images and captions • Train the model using CNN and LSTM (features->relu->image embedding->dropout->LSTM) • Save the model • Generate captions for the test data and calculate the performance metrics of the model

Part 2:

• Apply K-Means clustering to create clusters from the captions of the dataset • Use the k-means model to predict the cluster the given input text belongs to

Code Implementation:

• Extracting features

• Loading Captions and creating Vocabulary

• Develop Model

• Generate Captions for the test data

• Computing performance of the model

Sample Test images:

Sample Captions:

Performance of the model (BLEU):

Part 2:

Code:

Prediction:

For input “boy is playing”, model predicted that it belongs to cluster-3