Questions Perceptron - ufal/NPFL095 GitHub Wiki
Michael Collins: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, EMNLP 2002.
-
Suppose you have a tagset consisting of two tags: N(noun) and X(not noun) and a training sentence:
Luke/N I/X am/X your/X father/N
During the training, this best tag sequence for this sentence is found:
N N X N X
How would this result alter values of αX,X,X and αN,father?
Supposing that the best tag sequence won't change, what would be your answer if "father/N" is replaced by "Luke/X"? -
Suppose this tagged sentence as the only entry in your training data:
a/DT boy/NN saw/VBD a/DT girl/NN with/IN a/DT nice/JJ hat/NN
How many features will a tagger from section 2.4 have, when its training is identical to the one from section 2.1?
(For some reasons, you want to use all 36 tags from Penn Treebank tagset.)
Which tag sequence (z[1:n_1]) will be selected for this sentence in the first iteration of the algorithm? -
What is the difference between Maximum-Entropy and the Perceptron model training in the experiments?
-
Do you think that this task can be parallelized? How do you think the performance of tagger presented in the paper will change when you introduce parallelism?
-
Given a linearly separable data set, is there only one solution (hyperplane) that a perceptron model can find?
-
Given the sentence from question 2, let's expect a tag sequence
NN NN NN NN NN NN NN NN NN
was selected in the first iteration of the algorithm.
How will the algorithm proceed in the following iterations? How many iterations it will need to obtain the correct tags? How many feature weights with non-zero value it will need?