Questions Probing (Hewitt & Liang) - ufal/NPFL095 GitHub Wiki

Background Info

  • The term MLP (Multilayer perceptron) is ambiguous. In this paper (and many others), it refers loosely to any feedforward (fully-connected) neural network (in this paper the ReLU activation is used, according to Section 3.1). Elsewhere, MLP could refer to multiple layers of perceptrons (with the threshold activation function).

  • Dropout is a regularization technique (i.e. a technique for reducing overfitting): when training neural networks, randomly chosen nodes are "dropped out", i.e. deleted from the network. Different nodes are dropped in each training step. In this paper, p is the drop probability, i.e. each node is dropped with probability p (and kept with probability 1-p). Other papers (including the original Srivastava et al. (2014), report p as the keep probability.

Questions

  1. What is the purpose of probes?

  2. What is the purpose of control tasks?

  3. What is selectivity? Can it be negative? Why?

  4. Last week, we discussed feature-based vs. fine-tuning approaches. Which approach is being used in this paper? Could we use also the other approach for probes? Why?

  5. You have the pre-trained ELMo model and standard English PoS tagging training data (PennTB), but no other data. You should design the best possible English PoS tagger (ideally, performing well also on non-PennTB texts). How would you proceed? Would you use ELMo1 (the 1st layer) or ELMo2 (the 2nd layer) or something else? Why?

Works Cited

Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014): 1929-1958.