VQA Project Implementation - priyankag110/Convolutional-VQA GitHub Wiki

Convolutional-VQA

We are refering to following Github Link for our VQA project:

https://github.com/paarthneekhara/convolutional-vqa

Code for our Project Implementation at below Github Link:

https://github.com/priyankag110/Convolutional-VQA

Dataset Used:

We are using MSCOCO Dataset for our project and You can download the dataset at following link:

https://visualqa.org/download.html

We are going to download following 3 dataset:

Keyword Specifics:

Download Images based on your keywords and split images into Training and Testing/Validation Images ( Split could be 80/20 or 70/30)

Data Preprocessing:

If you check the format of Questions and corresponding annotations file we need to download annotations/questions based only on our image ids:

Preprocessing of Questions File:

Refer the code at: Data_Preprocess/questions.py

You have to pass image ids(Based on your keyword) and MSCOCO questions 2017 file(Downloaded previously) as input.

Output of the program will be annotations file based on image id provided in input.(Initial part of output file you need to copy from MSCOCO questions file( Info part from MSCOCO questions file)

Preprocessing of Annotations File:

Refer the code at: Data_Preprocess/annotations.py

You have to pass image ids(Based on your keyword) and MSCOCO annotations 2017 file(Downloaded previously) as input.

Output of the program will be annotations file based on image id provided in input.(Initial part of output file you need to copy from MSCOCO annotations file( Info part from MSCOCO annotations file)

Both the above programs can be run for Validation Image Ids as well.

DataLoader.py:

Code at: https://github.com/priyankag110/Convolutional-VQA/blob/master/data_loader.py

To run the program use following command:

python data_loader.py --version=1

Here version 1 refers to MSCOCO open ended questions and version 2 refers to Multiple choice questions.

We are referring to OpenEnded questions.

The program removes infrequent words and some NLP Processing and prepares following 2 files:

The inputs for the above code is: All these files are generated by above program:

  1. Training Questions
  2. Training Annotations
  3. Validation Questions
  4. Validation Annotations

It generates 2 files qa_data_file and vocab_file.

Feature Extraction:

We have to download pretrained Models first and Unzip the downloaded files and create the directory Data/CNNModels and save both models there.

Download Link for VGG16 weights: Download # Model weights - vgg16_weights.npz file.

http://www.cs.toronto.edu/~frossard/post/vgg16/

Download link for Resnet Model:

http://download.tensorflow.org/models/resnet_v2_152_2017_04_14.tar.gz

Your Directory should Look like this:

Code at: https://github.com/priyankag110/Convolutional-VQA/blob/master/extract_conv_features.py

For Feature extraction we have following 3 pretrained CNN Models.

  1. VGG16 (FC7 layer -- Fully connected Layer)
  2. VGG16 (POOL5 Layer -- Pooling Layer)
  3. Resnet Block 4

As of now we are focusing on VGG16 Pool5 and Resent Block 4.

To run the code use following command (for Block4 Resnet)

python extract_conv_features.py --feature_layer="block4"

For VGG16

python extract_conv_features.py --feature_layer="pool5"

Input to program is:

  1. Captions file: We need to create this file for our image ids. for my execution I created it manually or we need to make few chanages to code so it will take those image ids from text file rather than using captions file:

Format of Captions file is


{"info": {
    "description": "This is v1.0 of the captions for the inital 50,000 abstract scenes of the VQA dataset.",
    "url": "http://visualqa.org",
    "version": "1.0",
    "year": 2015,
    "contributor": "VQA Team",
    "date_created": "2015-10-03 03:51:56"
  },
  "task_type": "Captioning",
  "license": {
    "url": "http://creativecommons.org/licenses/by/4.0/",
    "name": "Creative Commons Attribution 4.0 International License"
  },
  "data_type": "abstract_v002",
  "data_subtype": "train2015",
  "images": [
    {
      "url": "http://visualqa.org/data/abstract_v002/scene_img/img/0.png",
      "file_name": "abstract_v002_train2015_000000000000.png",
      "image_id": 144,
      "width": 700,
      "height": 400
    },
    {
      "url": "http://visualqa.org/data/abstract_v002/scene_img/img/1.png",
      "file_name": "abstract_v002_train2015_000000000001.png",
      "image_id": 2342,
      "width": 700,
      "height": 400
    },
    {
      "url": "http://visualqa.org/data/abstract_v002/scene_img/img/2.png",
      "file_name": "abstract_v002_train2015_000000000002.png",
      "image_id": 3389,
      "width": 700,
      "height": 400
    }

We are concerned only with Image Id part: so you can make your own captions file based on above format:

You also need to pass your training images as input to this program:

This program basically checks your image ids from caption file and checks them into your Image folder and all those images are used for feature extraction.

Output of this program:

It creates following files depending upon which layer you are using for feature extraction in .h5 format.

So now Image Features are extracted successfully.

Training and Evaluation Part: