LLM as judge - chunhualiao/public-docs GitHub Wiki

LLM-as-judge

better to have multiple judges
need calibration,
with human raters,
with cross-judge ensemble
judge should not be an evaluated model (conflict of interest)