Result for alternative prompts and conditions - Ljia1009/LING573_AutoMeta GitHub Wiki
1. Length condition
(the default max_length of the pipeline is 142)
baseline condition: min_length = 40 (for summarization of each review); min_length = 90 (for summarization of meta-review)
condition 1: min_length = 100
condition 2: min_length = 150, and the max_length is dynamic (set to be the same with the min_length of the inputs per batch to make sure that the output length will not be larger than the input length)
Mean rouge score
baseline condition: 0.1639564005212592
condition 1: 0.1713935948854972
condition 2: 0.17092215904345745
Mean bertscore (f1)
baseline condition: 0.7698425740906687
condition 1: 0.7753120155045481
condition 2: 0.7790058598373876
2. Whether to add prompt when summarizing each individual review
individual prompt: Below is a review for a paper. Summarize the main content, strengths, weaknesses of the paper, and reviewer's decision on acceptance or rejection
other conditions: min_length = 150; max_length = dynamic
Mean rouge score
baseline: 0.17092215904345745
conditioned: 0.16695488455840027
Mean bertscore (f1)
baseline: 0.77546644449234
conditioned: 0.7738951587677002
3. Different prompts
baseline_prompt = ''' Below are multiple summaries of a paper's reviews. '''
prompt_v1 = ''' Below are multiple summaries of different reviews on the same paper. Please summarize the paper reviews and decide on whether the paper is accepted or rejected. '''
prompt_v2 = ''' Below are multiple summaries of different reviews on the same paper. Summarize the main content, strengths, weaknesses of the paper based on the reviews, and decide on whether the paper is accepted or rejected.'''
other conditions:
min_length = 150; max_length = dynamic
Mean rouge score
baseline condition: 0.17092215904345745
condition 1: 0.17130494216009776
condition 2: 0.17116903143263282
Mean bertscore (f1)
baseline condition: 0.77546644449234
condition 1: 0.7748177874088288
condition 2: 0.7721376180648803