batch normalization vs. layer normalization - SoojungHong/MachineLearning GitHub Wiki