Part of Speech Tagging Results - mohsensalari/cs571 GitHub Wiki
AdaGrad vs Perceptron
As can be seen from the chart below, Adaptive Gradient Descent Algorithm works better than the Perceptron. This behavior does not change when either of the two algorithms go through more iterations.
The difference is even more vivid when we complement the Adaptive Gradient Descent with averaging (Fig 2).
Averaging the Adaptive Gradient Descent
Below you can find the results of AdaGrad with and without averaging. Averaging has virtually no effect during the first iteration, but as we move along, the effect becomes very much noticeable. By the tenth iteration the added power of averaging is most vivid, and little is changed after that.