dependency_parsing - shigashiyama/nlp_survey GitHub Wiki

依存構造解析のサーベイ

英語・多言語依存構造解析

ニューラル手法：transition-based

(Chen 2014) A fast and accurate dependency parser using neural networks], EMNLP, 2014 PDF(http://aclweb.org/anthology/D/D14/D14-1082.pdf)

To be written

(Zhang 2016) Stack-propagation: Improved Representation Learning for Syntax, arXiv, 2016

To be written

(Dyer 2016) Recurrent Neural Network Grammars, NAACL, 2016

To be written

(Andor 2016) Glovally Normarized Transition-Based Neural Networks, ACL, 2016

To be written

(Kiperwasser 2016) Simple and accurate dependency parsing using bidirectional LSTM feature representations, TACL, 2016 PDF(https://arxiv.org/pdf/1603.04351.pdf)

概要：BiLSTM に基づく transition-based および graph-based 手法を提案．Transition-based モデルで，PTB-SD および CTB で SOTA 付近の精度を達成．学習済み word embedding の利用により，transition-based では精度向上しているが (PTB-SD の UAS：93.2 -> 93.9)，graph-based では低下しているが (93.1 -> 93.0) 理由は不明．
方針：transition-based では Shift，Left，Right からアクションを選択する arc-hybrid 法を使用している．graph-based では，arc-factored 法により弧のスコアを基に木のスコアを得る．modifier，head 対のスコアを計算した後，Einser 法により projective な (単語出現順に対して交差のない) 木を得る．弧のラベルは，弧を決定後に予測する．
モデル
- transition-based： features -> MLP (下図)
- graph-based： features -> MLP (arc) / MLP (label)
素性
- 共通：単語，品詞の embedding -> BiLSTM = 単語ベクトル
- transition-based：stack の位置0,1,2の単語，buffer の位置0の単語
- graph-based：head，modifier の単語
学習
- transition-based：margin-based hinge loss & dynamic oracle training & aggressive expoloration
- graph-based：margin-based hinge loss & loss augmented inference
事前学習：言語モデルタスクにより学習した word embedding を使用 (Dyer 2015 と同モデル)
外部資源：English Gigaword / Chinese Gigaword (word embedding の学習)
その他：単語の出現頻度に応じた確率で未知語トークンを割り当てる word dropout を適用
実装：https://github.com/elikip/bist-parser (PyCNN)

(Chen 2016) Bi-directional Attention with Agreement for Dependency Parsing, EMNLP, 2016

To be written

(Kuncoro 2017) What Do Recurrent Neural Network Grammars Learn About Syntax?, EACL, 2017

To be written

(Zhang 2017) Stack-based Multi-layer Attention for Transition-based Dependency Parsing, EMNLP, 2017

To be written

(Kohita 2017) Effective online reordering with arc-eager transitions, IWPT, 2017

To be written

ニューラル手法：graph-based

(Dozat 2017a) Deep Biaffine Attention for Neural Dependency Parsing, ICLR, 2017 PDF(https://arxiv.org/pdf/1611.01734.pdf)

概要：Deep Biaffine Attention モデル．PTB-SD で transition-based の SOTA に近い精度 (UAS 95.7) を達成．CoNLL-2017 でも同モデルで SOTA を達成 (Dozat 2017b)．
方針：各単語について最も head らしい単語を選択することで弧を予測した後，予測された (modifier,head) 対の弧に対してラベルを推定する graph-based 手法．テスト時は，MST アルゴリズムを適用することで弧の集合が木となることを保証する．
モデル：features -> BiLSTM -> MLP(dep) or MLP(head) -> biaffine(arc) / biaffine(label) (下図)
素性：単語，品詞の embedding
学習：言及なし
事前学習・外部資源：GloVe の学習済みモデルを word embedding の初期化に使用

(Hashimoto 2017) A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, EMNLP, 2017 PDF(https://arxiv.org/pdf/1611.01587.pdf)

概要：複数の構文的・意味的タスクを MTL により一つのモデルで解く Joint Many-Task モデルを提案．品詞付与，チャンキング，依存構造解析，文間類似度判定，文間含意認識に適用し，品詞付与以外の4タスクで SOTA．依存構造解析では，単独で解いた場合の UAS 93.35 に対し，MTL の場合の UAS 94.67 を報告している．
方針：各単語について最も head らしい単語を選択することで弧を予測した後，予測された (modifier,head) 対の弧に対してラベルを推定する graph-based 手法．解析結果が木でない場合は，first-order Eisner アルゴリズムを適用することで弧の集合が木となることを保証する．
モデル：features -> BiLSTM -> biaffine+softmax (arc) / linear(?)+softmax (label)
素性：単語 embedding (文字 n-gram embedding の平均で計算)；品詞ラベル，チャンキングラベルの embedding (前段のレイヤーで学習)；チャンキング layer の隠れ層のベクトル
学習：学習データの NLL + 他タスクのパラメータに由来する正則化項
事前学習：skip-gram で学習した文字 n-gram embedding を使用
外部資源：Wikipedia (文字 n-gram embedding の学習)
その他：他タスクのパラメータを loss に反映

(Zheng 2017) Incremental Graph-based Neural Dependency Parsing, EMNLP, 2017 PDF(http://www.aclweb.org/anthology/D17-1173)

概要：高階の素性を用いた graph-based 手法．1階->高階(更新1回)->高階(更新2回)->高階(更新回数上限なし) の各手法の精度 (UAS) は，PTB で 90.88->93.31->94.76->95.53，CTB で 82.97->87.35->88.78->89.42 を達成しており，CTB では SOTA．
方針：各単語について最も head らしい単語を選択することで弧を予測した後，予測された (modifier,head) 対の弧に対してラベルを推定する graph-based 手法．テスト時は，Chu-Liu/Edmons アルゴリズムを適用することで弧の集合が木となることを保証する．1階 (1st-order) の素性のみで弧を推定した後，推定された木構造から得られる高階 (higher-order) の素性を用いてスコアを更新する方法を提案している．
モデル：features -> MLP(left-arc) or MLP(right-arc) or softmax(label)
素性
- 1st-order：単語，品詞の embedding -> CNN；head と modifier の距離の離散値
- Higher-order：兄弟，祖父，叔父ノードなど (詳細不明)
学習：margin-based loss (arc) / cross entropy loss (label)
事前学習：word2vec の学習済みモデルを word embedding の初期化に使用
外部資源：English Wikipedia / Chinese Wikipedia (word embedding の学習)

(Wang 2016) Graph-based Dependency Parsing with Bidirectional LSTM, ACL, 2016 PDF(http://www.aclweb.org/anthology/P16-1218)

概要：LSTM に基づく 1st-order graph-based 手法で，より高次の素性エンジニアリングに基づく伝統的な手法と同等に近い精度を達成している．
方針：(modifier,head) 対に対応する素性を FF に入力することで，弧らしさと弧のラベルを同時に推定する graph-based 手法．
モデル：head, modifier に関する features -> FF
素性
- word rep: word, POS -> FF -> BiLSTM
- word rep -> LSTM-Minus -> segment rep (prefix,infix,suffix)
- distance embedding
学習：max-margin
事前学習：structured skipgram の学習済みモデルを word embdding の初期化に使用
その他：隠れ層の FF レイヤーに tanh-cube activation を適用
外部資源：Gigaword corpus (word embedding の学習)

日本語 (単語) 依存構造解析

(Tanaka 2015) Word-based Japanese typed dependency parsing with grammatical function analysis, IJCNLP, 2015

To be written

(Tanaka 2017) Hierarchical Word Structure-based Parsing: A Feasibility Study on UD-style Dependency Parsing in Japanese, IWPT, 2017

To be written

(Kawahara 2017) Automatically acquired lexical knowledge improves Japanese joint morphological and dependency analysis, IWPT, 2017

To be written

統計的手法：graph-based

(Flannery 2010) A Pointwise Approach to Training Dependency Parsers from Partially Annotated Corpora, JNLP, 2010 PDF(https://www.jstage.jst.go.jp/article/jnlp/19/3/19_167/_pdf)

概要：EDA の論文．単語ごとに独立に head を推定しており，部分アノテーションコーパスからの学習が可能．Malt Parser，1st-order/2nd-order MST Parser と比較しほぼ同等の精度 (英語表現辞典コーパスで UAS 96.83) を達成している．
方針：各単語について最も head らしい単語を選択することで弧を予測する graph-based 手法．日本語の文を対象とし，head は modifier より必ず後方という制約を入れることで，予測された弧の集合から木が構成されることを保証している (他の言語には Chu-Liu/Edmond 法などの MST アルゴリズムが適用可能)．
モデル：対数線形モデル
素性
- modifier の表層，品詞；head (候補) の表層，品詞
- modifier の左右3単語の表層，品詞；head (候補) の左右3単語の表層，品詞
- modifier と head (候補) の距離

中国語依存構造解析

ニューラル手法：transition-based

(Kurita 2017) Neural joint model for transition-based chinese syntactic analysis, ACL, 2017 PDF(http://www.aclweb.org/anthology/P17-1111)

概要：単語分割+品詞付与+依存構造解析の複合タスクを同時に解く transition-based 手法を提案．CTB5 の単語分割では SegTag モデルにより F値 98.60，品詞付与では SegTagDep モデルにより F値 94.83，依存構造解析では SegTag+Dep ("+" はパイプライン) モデルで F値 82.60 でいずれも SOTA．
方針：単語分割+品詞付与+依存構造解析を一気通貫またはパイプラインで解く transition-based 手法．単語分割，品詞付与に対しては Shift，Append，依存構造解析に対しては Shift，Append，Reduce-right，Reduce-left のアクションを選択することで単語依存構造木を構築する．
モデル：features -> MLP+softmax (下図)
素性
- 文字 or 単語ベクトル：文字 n-gram (n=1~4) ごとの BiLSTM の出力を連結
- buffer の位置0の文字，stack の位置0,1,2の単語，stack 位置0,1の単語の子供にあたる単語
学習：gold path の negative log likelifood + ビーム内の系列のスコア
事前学習:word2vec で学習した文字，単語の embedding を使用
外部資源：なし