README.md

June 16, 2026 · View on GitHub

Shows the BLADE logo.

IOH-BLADE: Benchmarking LLM-driven Automated Design and Evolution of Iterative Optimization Heuristics

⭐ If you like this, please give the repo a star – it helps!

PyPI version Maintenance Python 3.11+ CodeCov

Tip

See also the Documentation.

Table of Contents

🔥 News

  • 2025.03 ✨✨ BLADE v0.0.1 released!

Introduction

BLADE (Benchmark suite for LLM-driven Automated Design and Evolution) provides a standardized benchmark suite for evaluating automatic algorithm design algorithms, particularly those generating metaheuristics by large language models (LLMs). It focuses on continuous black-box optimization and integrates a diverse set of problems and methods, facilitating fair and comprehensive benchmarking.

Features

  • Comprehensive Benchmark Suite: Covers various classes of black-box optimization problems.
  • LLM-Driven Evaluation: Supports algorithm evolution and design using large language models.
  • Built-In Baselines: Includes state-of-the-art metaheuristics for comparison.
  • Automatic Logging & Visualization: Integrated with IOHprofiler for performance tracking.

Feature Coverage Map

To make feature discovery easier, this table maps the main BLADE features to the implementation location and corresponding documentation entry.

FeatureImplementationDocumentation
Experiment orchestrationiohblade/experiment.pydocs/experiment.rst
Problems and benchmark wrappingiohblade/problem.py, iohblade/benchmarks/docs/problem.rst, docs/benchmarks.rst
Search methods (LLaMEA, LHNS, MCTS-AHD, EoH, ReEvo, Random Search, FunSearch helper)iohblade/methods/docs/method.rst, docs/methods.rst
LLM provider integrationiohblade/llm.pydocs/llm.rst
Loggers (local, MLflow, Trackio)iohblade/loggers/docs/loggers.rst
Baselinesiohblade/baselines/docs/baselines.rst
Plotting and analysis helpersiohblade/plots.py, iohblade/behaviour_metrics.pydocs/plots.rst, docs/behaviour_metrics.rst
Streamlit result browseriohblade/webapp.pydocs/webapp.rst
Utility helpers and exceptionsiohblade/utils.pydocs/utils.rst

Included Benchmark Function Sets

BLADE incorporates several benchmark function sets to provide a comprehensive evaluation environment:

NameShort DescriptionNumber of FunctionsMultiple Instances
BBOB (Black-Box Optimization Benchmarking)A suite of 24 noiseless functions designed for benchmarking continuous optimization algorithms. Reference24Yes
SBOX-COSTA set of 24 boundary-constrained functions focusing on strict box-constraint optimization scenarios. Reference24Yes
MA-BBOB (Many-Affine BBOB)An extension of the BBOB suite, generating functions through affine combinations and shifts. ReferenceGenerator-BasedYes
GECCO MA-BBOB Competition InstancesA collection of 1,000 pre-defined instances from the GECCO MA-BBOB competition, evaluating algorithm performance on diverse affine-combined functions. Reference1,000Yes
HLP (High-Level Properties)Generated benchmarks guided by high-level property combinations (e.g., separable, multimodality).Generator-BasedYes

In addition, several real-world applications are included.

Real World Benchmarks

NameDescription
Analysis
Auto-Correlation 1Minimise max(g)/I2\max(g) / I^2 for non-negative signals under fixed discretisation of [1/4,1/4][-1/4, 1/4].
Auto-Correlation 2Maximise L_2^{2}/(L1L) / (L_1 · L_\infty) for non-negative signals using discrete auto-convolution.
Auto-Correlation 3Minimise max(g)/I2\max(\|g\|) / I^2 for real-valued signals with non-zero integral.
AutoML
AutoML PipelinesGenerate and evaluate machine learning pipelines using scikit-learn.
Combinatorics
Erdős Minimum-Overlap ProblemMinimise the suprenum overlap integral between complementary measurable functions.
Euclidean Steiner Tree ProblemMinimise MST(points + Steiner points) / MST(points) ratio by adding optimal Steiner nodes.
Graph Colouring ProblemMinimise the number of colours needed for a colouring nodes of a graph, s.t. no adjacent nodes share same colour.
Fourier
Fourier Uncertainty InequalityMinimise uncertainty bound for functions of form P(x)eπx2P(x)e^{-πx²} under Hermite constraints.
Geometry
Heilbronn (Unit Triangle)Maximise the area of smallest triangle formed by 11 points in a unit-area triangle.
Heilbronn (Unit Convex Region)Maximise the area of smallest triangle formed by 13–14 points in a unit-area convex region.
Kissing Number (11D)Maximise number of integer vectors satisfying high-dimensional kissing constraints.
Min/Max Distance RatioMinimise squared ratio of maximum to minimum pairwise distances (2D/3D variants).
Spherical CodeMaximise the minimum pairwise angle among 30 points on a unit sphere.
Kernel Tuner
Kernel Tuning BenchmarkEvaluate metaheuristics for hardware kernel optimisation under constraints.
Logistics
Travelling Salesman ProblemMinimise total tour distance visiting each 2D point exactly once.
Vehicle Routing ProblemMinimise total travel distance for capacitated vehicles serving weighted customers.
Matrix Multiplication via Tensor Decomposition
Tensor CP FactorisationFind smallest CP rank enabling exact matrix multiplication under quantised factors.
Number Theory
Sums vs DifferencesMaximise c(U) measuring imbalance between sumsets and difference sets.
Packing
Circle PackingMaximise total packed circle area inside a circular container without overlap.
Hexagonal PackingMinimise area of smallest enclosing hexagon containing disjoint regular hexagons.
Rectangle PackingPack disjoint circles inside a fixed-perimeter rectangle under containment constraints.
Unit Square PackingPack disjoint circles inside a unit square while satisfying non-overlap constraints.

These benchmarks are provided with ready to run instances in run_benchmarks/, while the reusable benchmark definitions are organized under iohblade/benchmarks by domain (analysis, combinatorics, geometry, matrix multiplication, number theory, packing, and Fourier). Each domain folder includes a short README that summarizes the task and instances.

Included Search Methods

The suite contains the state-of-the-art LLM-assisted search algorithms:

AlgorithmDescriptionLink
LLaMEALarge Langugage Model Evolutionary Algorithmcode paper
EoHEvolution of Heuristicscode paper
FunSearchGoogle's GA-like algorithmcode paper
ReEvoLarge Language Models as Hyper-Heuristics with Reflective Evolutioncode paper
LLM-Driven Heuristics Neighbourhood SearchLLM-Driven Neighborhood Search for Efficient Heuristic Designcode paper
Monte Carlo Tree SearchMonte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Designcode paper

Note, FunSearch is currently not yet integrated.

Supported LLM APIs

BLADE supports integration with various LLM APIs to facilitate automated design of algorithms:

LLM ProviderDescriptionIntegration Notes
GeminiGoogle's multimodal LLM designed to process text, images, audio, and more. ReferenceAccessible via the Gemini API, compatible with OpenAI libraries. Reference
OpenAIDeveloper of GPT series models, including GPT-4, widely used for natural language understanding and generation. ReferenceIntegration through OpenAI's REST API and client libraries.
OllamaA platform offering access to various LLMs, enabling local and cloud-based model deployment. ReferenceIntegration details can be found in their official documentation.
ClaudeAnthropic's Claude models for safe and capable language generation. ReferenceAccessed via the Anthropic API.
DeepSeekDeveloper of the DeepSeek family of models for code and chat. ReferenceAccess via OpenAI compatible API at https://api.deepseek.com.
LMStudioA hardware specialised platform for Apple Silicon, helping you run massive LLMs locally on M1 or better Macs. ReferenceProvides a local OpenAI-compatible server, allowing integration via standard OpenAI client libraries pointed at a localhost endpoint.
MLX_LM (Beta)A hardware specialised python package for Apple Silicon, helping you run massive LLMs locally on M1 or better Macs. ReferenceUsed via Python APIs and CLI tools, often integrated into local inference workflows and Apple MLX pipelines.

Evaluating against Human Designed baselines

An important part of BLADE is the final evaluation of generated algorithms against state-of-the-art human designed algorithms. In the iohblade.baselines part of the package, several well known SOTA black-box optimizers are imolemented to compare against. Including but not limited to CMA-ES and DE variants.

For the final validation BLADE uses IOHprofiler, providing detailed tracking and visualization of performance metrics.

🎁 Installation

It is the easiest to use BLADE from the pypi package (iohblade).

  pip install iohblade

Important

The Python version must be larger or equal to Python 3.11. You need an OpenAI/Gemini/Ollama/Claude/DeepSeek API key for using LLM models.

You can also install the package from source using uv (0.7.19). make sure you have uv installed.

  1. Clone the repository:

    git clone https://github.com/XAI-liacs/BLADE.git
    cd BLADE
    
  2. Install the required dependencies via uv:

    uv sync
    
  3. (Optional) Install additional packages:

    uv sync --group kerneltuner --group dev --group docs
    

    This will install additional dependencies for development and building documentation. The (experimental) auto-kernel application is also under a separate group for now.

  4. (Optional) Intall Support for MLX optimised LLMs:

    uv sync --group dev --group apple-silicon --prerelease=allow
    

    Select all the groups required, and append it with --group apple-silicon --prerelease=allow, to install libraries that enable MLX Optimised LLMs support through mlx-lm and LMStudio.

💻 Quick Start

  1. Set up an API key for your preferred provider:

    • Obtain an API key from OpenAI, Claude, Gemini, or another LLM provider.
    • Set the API key in your environment variables:
      export OPENAI_API_KEY='your_api_key_here'
      
  2. Running an Experiment

    To run a benchmarking experiment using BLADE:

    import os
    
    from iohblade.experiment import Experiment
    from iohblade.llm import Ollama_LLM
    from iohblade.methods import LLaMEA, RandomSearch
    from iohblade.benchmarks import BBOB_SBOX
    from iohblade.loggers import ExperimentLogger
    
    llm = Ollama_LLM("qwen2.5-coder:14b") #qwen2.5-coder:14b, deepseek-coder-v2:16b
    budget = 50 #short budget for testing
    
    RS = RandomSearch(llm, budget=budget) #Random Search baseline
    LLaMEA_method = LLaMEA(llm, budget=budget, name="LLaMEA", n_parents=4, n_offspring=12, elitism=False) #LLamEA with 4,12 strategy
    methods = [RS, LLaMEA_method]
    
    problems = []
    # include all SBOX_COST functions with 5 instances for training and 10 for final validation as the benchmark problem.
    training_instances = [(f, i) for f in range(1,25) for i in range(1, 6)]
    test_instances = [(f, i) for f in range(1,25) for i in range(5, 16)]
    problems.append(BBOB_SBOX(training_instances=training_instances, test_instances=test_instances, dims=[5], budget_factor=2000, name=f"SBOX_COST"))
    # Set up the experiment object with 5 independent runs per method/problem. (in this case 1 problem)
    logger = ExperimentLogger("results/SBOX")
    experiment = Experiment(methods=methods, problems=problems, runs=5, show_stdout=True, exp_logger=logger) #normal run
    experiment() #run the experiment, all data is logged in the folder results/SBOX/
    

Trackio logging

To mirror results to a Trackio dashboard, install the optional dependency and use TrackioExperimentLogger:

uv sync --group trackio
from iohblade.loggers import TrackioExperimentLogger

logger = TrackioExperimentLogger("my-project")
experiment = Experiment(methods=methods, problems=problems, runs=5, exp_logger=logger)

🌐 Webapp

After running experiments you can browse them using the built-in Streamlit app:

uv run iohblade-webapp

The app lists available experiments from the results directory, displays their progress, and shows convergence plots.


💻 Examples

See the files in the examples folder for examples on experiments and visualisations.


🤖 Contributing

Contributions to BLADE are welcome! Here are a few ways you can help:

  • Report Bugs: Use GitHub Issues to report bugs.
  • Feature Requests: Suggest new features or improvements.
  • Pull Requests: Submit PRs for bug fixes or feature additions.

Please refer to CONTRIBUTING.md for more details on contributing guidelines.

🪪 License

Distributed under the MIT License. See LICENSE for more information.

✨ Citation

If you use BLADE in your research, please cite the following work:

@inproceedings{vanstein2025blade,
  author    = {Niki van Stein and Anna V. Kononova and Haoran Yin and Thomas B{\"a}ck},
  title     = {BLADE: Benchmark suite for LLM-driven Automated Design and Evolution of iterative optimisation heuristics},
  booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference Companion},
  series    = {GECCO '25 Companion'},
  year      = {2025},
  pages     = {2336--2344},
  publisher = {Association for Computing Machinery},
  address   = {New York, NY, USA},
  doi       = {10.1145/3712255.3734347},
  url       = {https://doi.org/10.1145/3712255.3734347}
}

The repository also provides a CITATION.cff file for use with GitHub's citation feature.


Happy Benchmarking with IOH-BLADE! 🚀