[implict zeropadding of Avgpool] Error in inception v3 conversion from Pytorch to Keras - microsoft/MMdnn GitHub Wiki

Error in inception v3 conversion from Pytorch to Keras [:warning: Avgpooling]

Source: Pytorch

Destination: Keras

Why we find this problem

When we test the CNTK parser and Keras Emitter , we find there is a big error in the final results. Then, we track the conversion of every layer of our model. We find the conv, bn and Maxpool layer correct, the conversion fo them gives almost the same output values of the original model. Then, we go on to observe the output after the avgpool layer. We find the result greatly different, especially those entries near the edges. Luckily, I do arithmetic division of the top-left element of the output between the converted model and original model. No matter what input I give, the division remains always the same, namely 4/9.

Find the reason why the avgpool conversion outputs differently.

The original avgpool in Pytorch is average pooling with implicit padding. That means when the kernel of pooling exceeds the edge, it will only count the number of pixels within the image. However, in our conversion to keras, we first emit the padding (Adding an extra padding layer). After the padding, we then do the average pooling. This way, it will count the number of pixels outside the image edges. In one word, The division denominator around the edges is different, which can be seen more clearly from the illustration below. It is the first Average pooling counting from the beginning, taking the input of 35x35 with the kernel size of 3 and padding size of 1. We will see the difference of the top-left element of the output.

image/avgpool.png

Therefore, in order to solve this problem, we have to be more carefully when convert the padding. However, the difficulty lies in that not all framework have the same format of pooling, or convolution layer. That is why we originally emit the padding first and then do the pooling.

The issue is also discussed in Mxnet issue #10194