2016 08 01 - HenglinShi/LSTM_LIP_READING GitHub Wiki

2016-08-01

One problem about the data size.

Currently the data has been split into two part: 80% training data (about 42 persons by 30 sequences per person), and 20% testing data ( bout 10 persons by 30 sequences per person).

However, there a problem comes:

Assuming we have batch_size as 80, then there will be 4 sequences needed for feeding the network for one iteration.
But we only have 42 by 30 sequences which is about 1200.
So that at most we can only have about 300 iterations.

The loss does not converge

The problem could be that we were training samples were not shuffled enough, which means that firstly we put a bunch of data with label 1 to feed the network, them label 2, and so on.