Gesture recognition using deep learning - RedHenLab/Gesture GitHub Wiki
This folder contains the code done as a part of GSoC 2016. The aim of the project was to do gesture recognition using deep learning. The two main parts in the project are segmentation and gesture recognition.
The final aim of the project is to segment each person individually from the video and do gesture recognition for all the persons in the video.
The two main parts of the project are in the directory video_seg and gesture_reco respectvely.
The aim of this section is to read a raw input video or image and cluster out all the persons in the image individually. To give class labels to the image, two networks were coded i.e Deconvolution network and Fully Convolutional network. Finally FCN was selected as it had less overhead. The main files in the fcn_basic folder are:
-
The file layers.py contains the implementation required for the full structure of the network.
-
The file fcn.py contains the structure of the network as described in http://dl.caffe.berkeleyvision.org/fcn8s-heavy-pascal.caffemodel using the layers implemented in layers.py.
-
The file video_support.py contains the functionalities required to load and save the video.
-
The module pipeline.py contains the code for detecting and clustering faces.
-
The detail help for the files can be found in the docs.
-
The sample output with segmentation can be found in the sample_out directory.
-
The weights folder has the trained weights required to run the code.
The module supports both image and video processing.The path for the input file can be given in calling the module at the end of the file fcn.py.
processImage("<file_name>")
processVideo("<file_name>")
Some weight files need to be downloaded from https://drive.google.com/drive/folders/0Bzb-U-Y7f243eWdyN0VhNC1TSDA?usp=sharing. Download the files in the fcn folder and place them in the weights folder in this directory.
To run the code just do:
THEANO_FLAGS=floatX=float32 python fcn.py
The module for image processing will show the output at the end of execution. For video processing , the videos for each person will be saved as out_<person_no>.mp4 in the same directory.
The aim of this section is to do gesture recognition for a sequence of frames. The module has been developed to take sequences from Cohn-Kanade dataset(http://www.pitt.edu/~emotion/ck-spread.htm). The architecture chosen is the multi velocity network given in https://arxiv.org/pdf/1603.06829v1.pdf. The main files in the folder are:
-
The file layers.py contains the implementation required for the full structure of the network.
-
The file net.py contains the structure of the network using the layers implemented in layers.py.
-
The file video_support.py contains the functionalities required to load and save the video.
-
The module ck_support.py contains the code for feeding the data from CK+ dataset to the system.
-
The file train.slurm is used for training the model on a cloud cluster.
-
The detail help for the files can be found in the docs.
To run the network, it is required to have pickle and Theano installed. For the code to be GPU compatible, it is required that the floatX flag be set to float32. To run the code for testing mode, set testing=True in the main. For training, set it to false. When training, the code will save the learned weights as a pickle file, the path to which can be changed as:
net.saveModel("<pickle_file>")
To run the code just do:
THEANO_FLAGS=floatX=float32 python net.py
To run the code for GPU do:
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python net.py
To submit the job to a cloud cluster do:
sbatch train.slurm
The code requires following dependencies for testing:
- Python
- Theano
- numpy
- Cuda for GPU
- Python Image Library
- OpenCV
This code has been developed and tested in python 2.7. This comes by default in linux system
This is a deep learning framework used for developing various architectures. To install, just do:
pip install theano
To install on linux system, just do:
sudo apt-get install cuda.
This should let theano find cuDNN to speed up the networks.
To install ,run:
pip install Pillow
To install, run:
sudo apt-get install opencv
An easy way to install all the dependencies and python on cluster where one does not have root privileges is to download anaconda(https://www.continuum.io/downloads) and reference python from there. After installing theano on the cluster, for GPU support, just do:
module load cuda/7.0.28
This will get the latest cuDNN version as of August 2016.
To run jobs on the cluster do:
sbatch <job_name>.slurm
The slurm file used in this project for training is given in gesture_reco/mul_vel_net folder.