2016 08 07: Studying LRCN Activity Recognition. - HenglinShi/LSTM_LIP_READING GitHub Wiki

References:

Project source: https://people.eecs.berkeley.edu/~lisa_anne/LRCN_video

#Network Infrastructure

Input Layer samples clip_markers labels
Convolution
ReLu
Pooling
LRN
Convolution
ReLu
Pooling
LRN
Convolution Reshape Reshape
ReLu
Convolution
ReLu
Convolution
ReLu
Pooling
InnerProduct
ReLu
Dropout
ReLu
Convolution
ReLu
Convolution
ReLU
Pooling
Reshape
---------------- -------------------- ------------------- --------------------
LSTM
Dropout
InnerProduct
---------------- -------------------- ------------------- --------------------
Softmax and Loss

#Input:

##Samples a 4-d vector with the size of 384 * 3 * 277 * 277, where 3 denotes the channel and 277 * 277 represents the image size.

However, the content of the input is strange, looks all are integers. (To check: because they are flow image)

##Labels a vector of 384

##Clip_markers a vector of 384

#Before reshaped data ##Labels and clip_markers are kept the same

##Sample

net.blobs['fc6'].data.shape output: 384 * 4096

#After Reshape ##Labels

  • net.blobs['reshape-label'].data.shape

  • Output: 16 * 24

  • dim-1: number of time values, so that each sequence contains 16 time step

  • dim-2: number of sequences, so that there are 24 sequences fed to lstm in each batch

  • All sequences are preprocesses to the same length, which is 16 tim step. Each column are for those frames from the same sequence, so their labels are the same.

##Clip_markers

  • net.blobs['reshape-cm'].data[:,1].shape
  • Output: 16 * 24
  • dim-1: number of time values, so that each sequence contains 16 time step
  • dim-2: number of sequences, so that there are 24 sequences fed to lstm in each batch
  • Same structure with above
  • first line are all zero, means the start of sequences, and others are 1

##Samples net.blobs['fc6-reshape'].data.shape Output: 16 * 24 * 4096

#As a result, inputs between our experiment and this, are same