about-evaluation.md

September 16, 2025 ยท View on GitHub

Knowledge

General

Automatic evaluation

Two cool overviews on the challenges of automatic evaluation!

LLM as a judge

Cool summaries and experience feedbacks:

Software

Evaluation suites

  • lm_eval, by Eleuther (also known as "the Harness"). The powerhouse of LLM evaluations, allowing you to evaluate any LLMs from many providers on a range of benchmarks, in a stable and reproducible way.
  • lighteval, by Hugging Face (disclaimer: I'm one of the authors). A light LLM evaluation suite, focused on customization and recent benchmarks.

Leaderboards

Tutorials