Notes for MReD Paper - Ljia1009/LING573_AutoMeta GitHub Wiki

1 Introduction

  • Fully-annotated meta-review dataset, better use of the domain knowledge for text generation.
    • In-depth understanding of the structure of meta-reviews in a peer-reviewing system, namely the open review system of ICLR.
  • New task of controllable generation focusing on controlling the passage macro structures.
    • Controlling not only the intent of a single generated sentence but also the whole structure of a generated passage
  • Simple yet effective control methods independent of the model architecture.

2 Data

  • ICLR meta-reviews, 2018-2021, from OpenReview.
  • 7,089 meta-reviews corresponding to 23,675 reviews.
  • 45,929 sentences from 7,089 meta-reviews
  • labelled with 9 pre-defined intent categories: abstract, strength, weakness, suggestion, rebuttal process, rating summary, area chair (AC) disagreement, decision, and miscellaneous (misc).
    • abstract, strength, weakness easily summarized from the reviewers’ comment
    • middle range long, high & low score short
    • common patterns, e.g. abstract at the beginning, suggestion and decision at the end

3 Task & Methods

  • Task definition of structure-controllable text generation: given the text input (i.e., reviews) and a control sequence of the output structure, a model should generate a meta-review that is derivable from the reviews and presents the required structure.
  • Explored how to re-organize the input reviews and the control structure as an input sequence of the encoder.
    • Add the control sequence in front of the input text.
    • Linearize multiple review inputs into a single input.
      • rate-concat
      • rate-merge
      • longest-review (baseline)
    • Different control methods
      • sent-ctrl
      • seg-ctrl
      • unctrl
  • Model: bart-large-cnn with PyTorch & HF Transformers

4 Experiments

  • Baselines

    • Extractive: MMR, LexRank, TextRank; unctrl, sent-ctrl
      • sent-ctrl: LSTM-CRF tagger trained on the labeled meta-reviews to predict the labels of each input review sentence.
    • Generic: obtaining generic sentences from the training data from either the meta-review references (i.e., target) or the input reviews (i.e., source)
  • Setting

    • Filtered to 6,693 source-target pairs, randomly split into train, validation, and test by 8:1:1.
    • Output evaluated against the reference with F1 scores of ROUGE1, ROUGE2, and ROUGE-L.
    • k equal to the number of labels in the control sequence for extractive baselines, with sent-ctrl, and same k for the generic baselines.
    • Load the pretrained bart-large-cnn model, fine-tune on MReD, single V100 GPUs, batch size 1, target_length 20 to 400, source truncation lengths 1024, 2048, and 3072 tokens, learning rate 5e-5, Adam optimizer with momentum 0.9, 0.999 without warm-up steps or weight decay, seed 0, 3 epochs with gradient accumulation step 1, decoding beam size 4, length penalty 2.
  • Results

    • All controlled methods outperform their unctrl settings.
    • For bart-large-cnn, sent-ctrl better than seg-ctrl.
    • bart-large-cnn far outperforms the extractive and generic baselines: meta-review writings are different from the input reviews; transformers model is capable of capturing content-specific information.
    • rate-concat > merge > rate-merge
    • For each generated sentence, the corresponding control token has the highest attention weights; the model can correctly extract relevant information from the source sentences; different control sequences generates varied outputs.
    • Human evaluation focused on fluency, content relevance, structure similarity, decision correctness.