Questions Biaffine parser - ufal/NPFL095 GitHub Wiki
Dozat and Manning: Deep biaffine attention for neural dependency parsing, 2016 (published at ICLR 2017).
-
Your task is to guess dependency relation labels (e.g. UD-style nsubj, obj, obl, advcl,...) in an unlabeled dependency tree, but you can see only one of the following four: dependent node word form, dependent node PoS tag, governing node word form and governing node PoS tag. Which one would you choose and why? (The answer is not contained in the paper.)
-
Can you please help me completing/fixing the Linear algebra recap below? (I was not able to find the answer in the paper nor my rust-eaten memory.)
-
What do you like and dislike about the paper? Is there anything unclear?
- linear transformation is generally a mapping f between two vector spaces, but in NN we are interested specifically in functions f mapping n-dimensional matrix (of real numbers) X into m-dimensional matrix Y. Such function is linear iff it can be written as f(x) = Wx where W is an m×n matrix (and x is n×1).
- affine transformation in the NN context, is a function f(x) = Wx + b, which has (in addition to linear transformations) an m×1 dimensional bias term b.
- bilinear transformation in the NN context, is a function f of two variables x1 and x2 (n1-dimensional and n2-dimensional, resp.), which can be written as ???. For any fixed x1, f(x1, x2) is linear in x2 and for any fixed x2, f(x1, x2) is linear in x1.
- I am not sure about this one: biaffine transformation in the NN context, is a function f of two variables x1 and x2 (n1-dimensional and n2-dimensional, resp.), which can be written as f(x1, x2) = x1T W1 x2 + (x1 ⨁ x2)T W2 + b, where x1 ⨁ x2 is a n1×n2 dimensional matrix (x1i, x2j)i,j and W1, W2 and b are parameters with dimensions n1×n2, n1×n2, ?, resp.
- Open review of the paper with several interesting explanations by the first author.
- The original implementation and a more recent implementation
- a follow-up publication about CoNLL-2017 winning system
- first author's web with pre-trained models, slides and a poster for the two publications
Martin