2016 08 07: Studying LRCN Activity Recognition. - HenglinShi/LSTM_LIP_READING GitHub Wiki
References:
Project source: https://people.eecs.berkeley.edu/~lisa_anne/LRCN_video
#Network Infrastructure
Input Layer | samples | clip_markers | labels |
---|---|---|---|
Convolution | |||
ReLu | |||
Pooling | |||
LRN | |||
Convolution | |||
ReLu | |||
Pooling | |||
LRN | |||
Convolution | Reshape | Reshape | |
ReLu | |||
Convolution | |||
ReLu | |||
Convolution | |||
ReLu | |||
Pooling | |||
InnerProduct | |||
ReLu | |||
Dropout | |||
ReLu | |||
Convolution | |||
ReLu | |||
Convolution | |||
ReLU | |||
Pooling | |||
Reshape | |||
---------------- | -------------------- | ------------------- | -------------------- |
LSTM | |||
Dropout | |||
InnerProduct | |||
---------------- | -------------------- | ------------------- | -------------------- |
Softmax and Loss |
#Input:
##Samples a 4-d vector with the size of 384 * 3 * 277 * 277, where 3 denotes the channel and 277 * 277 represents the image size.
However, the content of the input is strange, looks all are integers. (To check: because they are flow image)
##Labels a vector of 384
##Clip_markers a vector of 384
#Before reshaped data ##Labels and clip_markers are kept the same
##Sample
net.blobs['fc6'].data.shape output: 384 * 4096
#After Reshape ##Labels
-
net.blobs['reshape-label'].data.shape
-
Output: 16 * 24
-
dim-1: number of time values, so that each sequence contains 16 time step
-
dim-2: number of sequences, so that there are 24 sequences fed to lstm in each batch
-
All sequences are preprocesses to the same length, which is 16 tim step. Each column are for those frames from the same sequence, so their labels are the same.
##Clip_markers
- net.blobs['reshape-cm'].data[:,1].shape
- Output: 16 * 24
- dim-1: number of time values, so that each sequence contains 16 time step
- dim-2: number of sequences, so that there are 24 sequences fed to lstm in each batch
- Same structure with above
- first line are all zero, means the start of sequences, and others are 1
##Samples net.blobs['fc6-reshape'].data.shape Output: 16 * 24 * 4096
#As a result, inputs between our experiment and this, are same