Result for alternative prompts and conditions - Ljia1009/LING573_AutoMeta GitHub Wiki

1. Length condition

(the default max_length of the pipeline is 142)

baseline condition: min_length = 40 (for summarization of each review); min_length = 90 (for summarization of meta-review)

condition 1: min_length = 100

condition 2: min_length = 150, and the max_length is dynamic (set to be the same with the min_length of the inputs per batch to make sure that the output length will not be larger than the input length)

Mean rouge score

baseline condition: 0.1639564005212592

condition 1: 0.1713935948854972

condition 2: 0.17092215904345745

Mean bertscore (f1)

baseline condition: 0.7698425740906687

condition 1: 0.7753120155045481

condition 2: 0.7790058598373876


2. Whether to add prompt when summarizing each individual review

individual prompt: Below is a review for a paper. Summarize the main content, strengths, weaknesses of the paper, and reviewer's decision on acceptance or rejection

other conditions: min_length = 150; max_length = dynamic

Mean rouge score

baseline: 0.17092215904345745

conditioned: 0.16695488455840027

Mean bertscore (f1)

baseline: 0.77546644449234

conditioned: 0.7738951587677002


3. Different prompts

baseline_prompt = ''' Below are multiple summaries of a paper's reviews. '''

prompt_v1 = ''' Below are multiple summaries of different reviews on the same paper. Please summarize the paper reviews and decide on whether the paper is accepted or rejected. '''

prompt_v2 = ''' Below are multiple summaries of different reviews on the same paper. Summarize the main content, strengths, weaknesses of the paper based on the reviews, and decide on whether the paper is accepted or rejected.'''

other conditions:

min_length = 150; max_length = dynamic

Mean rouge score

baseline condition: 0.17092215904345745

condition 1: 0.17130494216009776

condition 2: 0.17116903143263282

Mean bertscore (f1)

baseline condition: 0.77546644449234

condition 1: 0.7748177874088288

condition 2: 0.7721376180648803