Gesture recognition using deep learning - RedHenLab/Gesture GitHub Wiki

This folder contains the code done as a part of GSoC 2016. The aim of the project was to do gesture recognition using deep learning. The two main parts in the project are segmentation and gesture recognition.

Target functionality

The final aim of the project is to segment each person individually from the video and do gesture recognition for all the persons in the video.

Code-overview

The two main parts of the project are in the directory video_seg and gesture_reco respectvely.

video_seg

The aim of this section is to read a raw input video or image and cluster out all the persons in the image individually. To give class labels to the image, two networks were coded i.e Deconvolution network and Fully Convolutional network. Finally FCN was selected as it had less overhead. The main files in the fcn_basic folder are:

  • The file layers.py contains the implementation required for the full structure of the network.

  • The file fcn.py contains the structure of the network as described in http://dl.caffe.berkeleyvision.org/fcn8s-heavy-pascal.caffemodel using the layers implemented in layers.py.

  • The file video_support.py contains the functionalities required to load and save the video.

  • The module pipeline.py contains the code for detecting and clustering faces.

  • The detail help for the files can be found in the docs.

  • The sample output with segmentation can be found in the sample_out directory.

  • The weights folder has the trained weights required to run the code.

To run the code:

The module supports both image and video processing.The path for the input file can be given in calling the module at the end of the file fcn.py.

processImage("<file_name>")
processVideo("<file_name>")

Some weight files need to be downloaded from https://drive.google.com/drive/folders/0Bzb-U-Y7f243eWdyN0VhNC1TSDA?usp=sharing. Download the files in the fcn folder and place them in the weights folder in this directory.

To run the code just do:

THEANO_FLAGS=floatX=float32 python fcn.py

The module for image processing will show the output at the end of execution. For video processing , the videos for each person will be saved as out_<person_no>.mp4 in the same directory.

gesture-reco

The aim of this section is to do gesture recognition for a sequence of frames. The module has been developed to take sequences from Cohn-Kanade dataset(http://www.pitt.edu/~emotion/ck-spread.htm). The architecture chosen is the multi velocity network given in https://arxiv.org/pdf/1603.06829v1.pdf. The main files in the folder are:

  • The file layers.py contains the implementation required for the full structure of the network.

  • The file net.py contains the structure of the network using the layers implemented in layers.py.

  • The file video_support.py contains the functionalities required to load and save the video.

  • The module ck_support.py contains the code for feeding the data from CK+ dataset to the system.

  • The file train.slurm is used for training the model on a cloud cluster.

  • The detail help for the files can be found in the docs.

To run the code:

To run the network, it is required to have pickle and Theano installed. For the code to be GPU compatible, it is required that the floatX flag be set to float32. To run the code for testing mode, set testing=True in the main. For training, set it to false. When training, the code will save the learned weights as a pickle file, the path to which can be changed as:

net.saveModel("<pickle_file>")

To run the code just do:

THEANO_FLAGS=floatX=float32 python net.py

To run the code for GPU do:

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python net.py

To submit the job to a cloud cluster do:

sbatch train.slurm

Dependencies

The code requires following dependencies for testing:

  • Python
  • Theano
  • numpy
  • Cuda for GPU
  • Python Image Library
  • OpenCV

Python

This code has been developed and tested in python 2.7. This comes by default in linux system

Theano

This is a deep learning framework used for developing various architectures. To install, just do:

pip install theano

Cuda

To install on linux system, just do:

sudo apt-get install cuda.

This should let theano find cuDNN to speed up the networks.

PIL

To install ,run:

pip install Pillow

OpenCV

To install, run:

sudo apt-get install opencv

Running on Cluster

An easy way to install all the dependencies and python on cluster where one does not have root privileges is to download anaconda(https://www.continuum.io/downloads) and reference python from there. After installing theano on the cluster, for GPU support, just do:

module load cuda/7.0.28

This will get the latest cuDNN version as of August 2016.

To run jobs on the cluster do:

sbatch <job_name>.slurm

The slurm file used in this project for training is given in gesture_reco/mul_vel_net folder.

⚠️ **GitHub.com Fallback** ⚠️