LAB 3 - Hiresh12/UMKC GitHub Wiki
# BIG DATA ANALYTICS AND APPLICATIONS – LAB 3
Hiresh Jakkala Bhaskar – 8
Anvesh Mandadi – 17
AIM :
Part -1:
• To Develop a show and model using the dataset o I used images that has objects like table, door, kitchen, jacket o This dataset is used to train the model so that we can identify the indoor activities described in the image o This is help us in track the activities of children and pets in home • To make the model to tell caption for the given test images • To compute the performance metrics of the model using BLEU.
Part -2:
• Analyze the dataset using machine learning (unsupervised)
Step Followed:
Part 1:
• Pre-processing the dataset to generate lemma for the captions, splitting the dataset into train and test data. • Extract the features from the train images and captions • Train the model using CNN and LSTM (features->relu->image embedding->dropout->LSTM) • Save the model • Generate captions for the test data and calculate the performance metrics of the model
Part 2:
• Apply K-Means clustering to create clusters from the captions of the dataset • Use the k-means model to predict the cluster the given input text belongs to
Code Implementation:
• Extracting features
• Loading Captions and creating Vocabulary
• Develop Model
• Generate Captions for the test data
• Computing performance of the model
Sample Test images:
Sample Captions:
Performance of the model (BLEU):
Part 2:
Code:
Prediction:
For input “boy is playing”, model predicted that it belongs to cluster-3