Methodologies from Previous Work - Ljia1009/LING573_AutoMeta GitHub Wiki
MetaGen (fine-tuned UniLM)
Goal: generating an assistive meta-review, predicting acceptance of a paper
Step 1: extractive draft generation
- Preprocessing by combining sentences with coreference and connectors, clustering on sentence vectors from tf-isf, and attaching reviewer tags.
- Generating sentence graph based on cosine similarity for random walk with restart, with added weights for meta-review specific terms and review updates, plus aspect scores
Step 2: acceptance decision prediction using UniLM for label added to the draft for generation
Step 3: abstractive review generation using UniLM fine-tuned for seq2seq tasks, with the decision sentences filtered out from the original meta-reviews, also fine-tuned separately for decision prediction
Auto Eval: accuracy; R1/R2/RL F1
AutoMeta: sentence level (even combined) extractive summary not ideal
Decision-Aware Multi-Encoder (trained Transformer)
Component 1: encoder-decoder
- three encoders, multi-head attention, residual connections, feed-forward
- one decoder, cross attention parallelly to encoder key-value pair, normalization, feed-forward
- input sequence r^1_i, r^2_i, r^3_i, to encoder representation z_i consisting of hidden states h_i and key-value pairs kp_i, to output sequence y_i
Component 2: decision awareness
- concatenate encoder's hidden states h_hat after average pooling passed to fully connected layer, as context of decision in every decoder layer
Component 3: decision prediction
- hidden states h_hat passed through ReLU, fed to new linear layer for decision prediction y_hat (A/R)
Component 4: loss function
- Weighted sum of CE loss for decision and generation
Architecture 1: Simple Meta-Review Generator
- three encoders each with two encoder layers and a decoder of two decoder layers
Architecture 2: MRG with Decision at last
- last hidden states of the decoder for both tasks, one with a linear layer for generation and another separate linear layer combined with dropout and ReLU for decision prediction
Architecture 3: Decision-aware MRG
- decision prediction from encoders, carrying the decision vector encoded from the encoder-hidden state output to the decoder layer, to provide the context to the generator module
Auto Eval: accuracy; R1/R2/R3/BERTScore/S3/BLEU
AutoMeta: new transformer model from scratch (pre-trained with permutation of input reviews???) is over-engineering
OPINIONDIGEST Framework (trained Transformer)
Opinion set of a review r: O_r = {(o_i, pol_i, a_i)}
(opinion phrases, polarity, aspect categories)
Step 1: opinion extraction using a pre-trained tagging model
Step 2: opinion selection
- opinion merging: for each o in O_e, iterate through existing cluster greedily added to the first where average word embedding of opinion phrase are similar cos(v_i, v_i) >= theta, or a new cluster, with Repr(C_i) closest to centroid.
- opinion ranking: top k largest clusters
- opinion filtering: by aspect category or sentiment polarity
Step 3: summary generation
- review reconstruction: textualization of the extracted opinion set -> the review
- summarization: using trained Transformer
Auto Eval: R1/R2/RL
AutoMeta: ranking and filtering not needed, texualization to be handled by seq2seq transformers; opinion extraction (plus evidence?), opinion merging
ProCluster: Proposition-Level Clustering (fine-tuned BART)
Clustering from supervised open information extraction (OpenIE)
Step 1: proposition extraction as predicate + arguments via OpenIE
Step 2: filtering with a salience model, from fine-tuned Cross-Document Language Model
Step 3: clustering propositions using SuperPAL, a binary classifier based on paraphrastic similarity
Step 4: ranking with the largest clusters included
Step 5: fusing sentences from each cluster
- deriving training data from reference using SuperPAL for the aligned summary proposition
- fine-tuning BART generation model with cluster propositions as input
Auto Eval: R1/R2/RSU4 F1
AutoMeta: ranking and filtering not needed; proposition extraction (plus evidence?), more coarse level clustering
DecSum (greedy extraction, with fine-tuned Longformer predictor)
Leveraging a supervised decision model for extractive decision-focused summarization
Problem formation: select X_tilde from X to support decision y. Training set {(X_i, y_i)}
Desideratum 1: decision faithfulness
L_F(X_tilde, X, f) = log|f(X_tilde) - f(X)|
Desideratum 2: decision representativeness
L_R(X_tilde, X, f) = log|W(Y_hat_X_tilde, Y_hat_X)|
Desideratum 3: textual non-redundancy
L_D(X_tilde) = sum(max_x'(cossim(s(x), s(x'))))
Algorithm: iterative greedy selection with beam search sized 4 to minimize loss function (exposes the design space as a white box)
Decision function: regression model f using Longformer
Model-based explanations: importance score from Integrated Gradients and Attention
Auto Eval: (f(X_tilde) - f(X))^2, W(Y_hat_X_tilde, Y_hat_X), SUM-QE (BERT-based) auto sum eval on 5 aspects: grammaticality, non-redundancy, referential clarity, focus, structure & coherence
AutoMeta: can use f(X_tilde) - f(X) as an evaluation metric for decision faithfulness
ACESUM: Aspect-Controllable
Eval: Sentence-filtering based on maximum cos-sim using BERT encodings representing tokens in review sentences and aspect seeds
AutoMeta: extraction of aspect through seed words?