Lab 3 - PavankumarManchala/CS5542_BigDataAnalyticsAppsLab GitHub Wiki
CS5542 Big Data Analytics and Apps Lab assignment 3
By:
Pavankumar Manchala Class Id: 16 Team: 5
Objectives:
- Image Caption Generation.
- Data analysis using Unsupervised learning.
Platforms used:
- Pycharm
- IntelIJ
Packages Installed:
- opencv-python
- numpy
- nltk
- matplotlib
- Tensorflow
- Show and tell model
- PIL
- BLEU score
- logging
- Heapq
Task 1:
Generating captions for our own dataset using show and tell model. Reporting accuracy in BLEU, CIDER, METEOR and ROGUE measures.
The SBU dataset has flexibility for data and contains caption and image URL files.
Show and tell model creates captions for the dataset. The output is as follows:
The screenshots of show and tell model as follows: Requirements:
Model funct:
Beam size: The Beam size is 4, so 4 captions for each image generated.
Caption generator:
Task 2:
Data analytics using unsupervised learning. Clustering mechanisms used like KMeans and EM clustering data into clusters respectively.
Result of KM_Clustering:
Result of EM_Clustering:
Code for KM_Clustering: The input data and hadoop property are initialized.
The code represents pushing captions into hash map and then clustering them accordingly, values are saved in .csv file.