LLM as judge - chunhualiao/public-docs GitHub Wiki

LLM-as-judge

  • better to have multiple judges
  • need calibration,
  • with human raters,
  • with cross-judge ensemble
  • judge should not be an evaluated model (conflict of interest)