Part of Speech Tagging Results - mohsensalari/cs571 GitHub Wiki

AdaGrad vs Perceptron

As can be seen from the chart below, Adaptive Gradient Descent Algorithm works better than the Perceptron. This behavior does not change when either of the two algorithms go through more iterations.

Adaptive Gradient Descent vs Perceptron

The difference is even more vivid when we complement the Adaptive Gradient Descent with averaging (Fig 2).

Adaptive Gradient Descent vs Perceptron

Averaging the Adaptive Gradient Descent

Below you can find the results of AdaGrad with and without averaging. Averaging has virtually no effect during the first iteration, but as we move along, the effect becomes very much noticeable. By the tenth iteration the added power of averaging is most vivid, and little is changed after that.

Adaptive Gradient Descent vs Perceptron