release_history.md

April 2, 2025 · View on GitHub

2.1.0

Deepseek-R1 distilled model support using vllm.
Evaluate Deepseek performance with LongBench, OpenOrca, Dolly and ConvFinQA datasets.
Replace conda with uv for faster installs.

2.0.27

Ollama end to end support

2.0.26

Bug fix for missing HuggingFace token file.
Config file enhancements

2.0.25

Fix bug with an alternate VariantName for SageMaker BYOE.

2.0.24

ARM benchmarking support (AWS Graviton 4).
Relax IAM permission requirements for Amazon SageMaker bring your own endpoint.

2.0.23

Bug fixes for Amazon SageMaker BYOE.
Additional config files.

2.0.22

Benchmarks for the Amazon Nova family of models.
Benchmarks for multi-modal models: LLama3.2-11B, Claude 3 Sonnet and Claude 3.5 Sonnet using the ScienceQA dataset.

2.0.21

Dynamically get EC2 pricing from Boto3 API.
Update pricing information and model id for Amazon Bedrock models.

2.0.20

Add hf_tokenizer_model_id parameter to automatically download tokenizers from Hugging Face.

2.0.19

Config files for Llama3.1-1b on AMD/Intel CPU instance types.
Bug fixes for token counting for vLLM.

2.0.18

Delete SageMaker endpoint as soon as the run finishes.

2.0.17

Add support for embedding models through SageMaker jumpstart
Add support for LLama 3.2 11b Vision Instruct benchmarking through FMBench
Fix DJL Inference while deploying djl on EC2(424 Inference bug)

2.0.16

Update to torch 2.4 for compatibility with SageMaker Notebooks.

2.0.15

Support for Ollama, see more details here.
Fix bugs with token counting.

2.0.14

Llama3.1-70b config files and more.
Support for fmbench-orchestrator.

2.0.13

Update pricing.yml additional config files.

2.0.11

Llama3.2-1b and Llama3.2-3b support on EC2 g5.
Llama3-8b on EC2 g6e instances.

2.0.9

Triton-djl support for AWS Chips.
Tokenizer files are now downloaded directly from Hugging Face (unless provided manually as before)

2.0.8

Support Triton-TensorRT for GPU instances and Triton-vllm for AWS Chips.
Misc. bug fixes.

2.0.6

Run multiple model copies with the DJL serving container and an Nginx load balancer on Amazon EC2.
Config files for Llama3.1-8b on g5, p4de and p5 Amazon EC2 instance types.
Better analytics for creating internal leaderboards.

2.0.5

Support for Intel CPU based instances such as c5.18xlarge and m5.16xlarge.

2.0.4

Support for AMD CPU based instances such as m7a.

2.0.3

Support for a EFS directory for benchmarking on EC2.

2.0.2

Code cleanup, minor bug fixes and report improvements.

2.0.0

🚨 Model evaluations done by a Panel of LLM Evaluators[1] 🚨

v1.0.52

Compile for AWS Chips (Trainium, Inferentia) and deploy to SageMaker directly through FMBench.
Llama3.1-8b and Llama3.1-70b config files for AWS Chips (Trainium, Inferentia).
Misc. bug fixes.

v1.0.51

FMBench has a website now. Rework the README file to make it lightweight.
Llama3.1 config files for Bedrock.

v1.0.50

Llama3-8b on Amazon EC2 inf2.48xlarge config file.
Update to new version of DJL LMI (0.28.0).

v1.0.49

Streaming support for Amazon SageMaker and Amazon Bedrock.
Per-token latency metrics such as time to first token (TTFT) and mean time per-output token (TPOT).
Misc. bug fixes.

v1.0.48

Faster result file download at the end of a test run.
Phi-3-mini-4k-instruct configuration file.
Tokenizer and misc. bug fixes.

v1.0.47

Run FMBench as a Docker container.
Bug fixes for GovCloud support.
Updated README for EKS cluster creation.

v1.0.46

Native model deployment support for EC2 and EKS (i.e. you can now deploy and benchmark models on EC2 and EKS).
FMBench is now available in GovCloud.
Update to latest version of several packages.

v1.0.45

Analytics for results across multiple runs.
Llama3-70b config files for g5.48xlarge instances.

v1.0.44

Endpoint metrics (CPU/GPU utilization, memory utiliztion, model latency) and invocation metrics (including errors) for SageMaker Endpoints.
Llama3-8b config files for g6 instances.

v1.0.42

Config file for running Llama3-8b on all instance types except p5.
Fix bug with business summary chart.
Fix bug with deploying model using a DJL DeepSpeed container in the no S3 dependency mode.

v1.0.40

Make it easy to run in the Amazon EC2 without any dependency on Amazon S3 dependency mode.

v1.0.39

Add an internal FMBench website.

v1.0.38

Support for running FMBench on Amazon EC2 without any dependency on Amazon S3.
Llama3-8b-Instruct config file for ml.p5.48xlarge.

v1.0.37

g5/p4d/inf2/trn1 specific config files for Llama3-8b-Instruct.
1. p4d config file for both vllm and lmi-dist.

v1.0.36

Fix bug at higher concurrency levels (20 and above).
Support for instance count > 1.

v1.0.35

Support for Open-Orca dataset and corresponding prompts for Llama3, Llama2 and Mistral.

v1.0.34

Don't delete endpoints for the bring your own endpoint case.
Fix bug with business summary chart.

v1.0.32

Report enhancements: New business summary chart, config file embedded in the report, version numbering and others.
Additional config files: Meta Llama3 on Inf2, Mistral instruct with lmi-dist on p4d and p5 instances.

2.0.8

Support Triton-TensorRT for GPU instances and Triton-vllm for AWS Chips.
Misc. bug fixes.