BIG-Bench Extra Hard (BBEH) Leaderboard

May 6, 2025 ยท View on GitHub

The table below represents the BBEH leaderboard, sorted by harmonic mean. To contribute, send an email to mehrankazemi@google.com with your model name, link to the paper for use in the contributed by section, and the scores.

Contributed byBBEHBBEH (Micro Avg)BBEH Mini (Micro Avg)
o3-mini (high)Original paper44.854.256.7
Gemini 2.0 FlashOriginal paper9.823.927.0
Gemini 2.0 Flash-LiteOriginal paper8.019.722.2
DeepSeek R1Original paper6.834.937.2
GPT4oOriginal paper6.022.323.5
Distill R1 Qwen 32bOriginal paper5.219.215.4
Gemma3 27bGemma3 + Original paper4.918.817.4
Gemma3 12bGemma3 + Original paper4.516.314.3
Gemma3 4bGemma3 + Original paper3.411.013.3
Gemma2 27b ITOriginal paper4.014.815.0
Llama 3.1 8b InstructOriginal paper3.610.611.5
RandomOriginal paper2.48.48.4