Questions Probing (Hewitt & Liang) - ufal/NPFL095 GitHub Wiki

Background Info

The term MLP (Multilayer perceptron) is ambiguous. In this paper (and many others), it refers loosely to any feedforward (fully-connected) neural network (in this paper the ReLU activation is used, according to Section 3.1). Elsewhere, MLP could refer to multiple layers of perceptrons (with the threshold activation function).
Dropout is a regularization technique (i.e. a technique for reducing overfitting): when training neural networks, randomly chosen nodes are "dropped out", i.e. deleted from the network. Different nodes are dropped in each training step. In this paper, p is the drop probability, i.e. each node is dropped with probability p (and kept with probability 1-p). Other papers (including the original Srivastava et al. (2014), report p as the keep probability.

Questions

What is the purpose of probes?
What is the purpose of control tasks?
What is selectivity? Can it be negative? Why?
Last week, we discussed feature-based vs. fine-tuning approaches. Which approach is being used in this paper? Could we use also the other approach for probes? Why?
You have the pre-trained ELMo model and standard English PoS tagging training data (PennTB), but no other data. You should design the best possible English PoS tagger (ideally, performing well also on non-PennTB texts). How would you proceed? Would you use ELMo1 (the 1st layer) or ELMo2 (the 2nd layer) or something else? Why?

Works Cited

Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014): 1929-1958.