release_history.md
April 2, 2025 ยท View on GitHub
2.1.0
- Deepseek-R1 distilled model support using
vllm. - Evaluate Deepseek performance with
LongBench,OpenOrca,DollyandConvFinQAdatasets. - Replace
condawithuvfor faster installs.
2.0.27
- Ollama end to end support
2.0.26
- Bug fix for missing HuggingFace token file.
- Config file enhancements
2.0.25
- Fix bug with an alternate VariantName for SageMaker BYOE.
2.0.24
- ARM benchmarking support (AWS Graviton 4).
- Relax IAM permission requirements for Amazon SageMaker bring your own endpoint.
2.0.23
- Bug fixes for Amazon SageMaker BYOE.
- Additional config files.
2.0.22
- Benchmarks for the Amazon Nova family of models.
- Benchmarks for multi-modal models: LLama3.2-11B, Claude 3 Sonnet and Claude 3.5 Sonnet using the ScienceQA dataset.
2.0.21
- Dynamically get EC2 pricing from Boto3 API.
- Update pricing information and model id for Amazon Bedrock models.
2.0.20
- Add
hf_tokenizer_model_idparameter to automatically download tokenizers from Hugging Face.
2.0.19
- Config files for
Llama3.1-1bon AMD/Intel CPU instance types. - Bug fixes for token counting for vLLM.
2.0.18
- Delete SageMaker endpoint as soon as the run finishes.
2.0.17
- Add support for embedding models through SageMaker jumpstart
- Add support for LLama 3.2 11b Vision Instruct benchmarking through FMBench
- Fix DJL Inference while deploying djl on EC2(424 Inference bug)
2.0.16
- Update to torch 2.4 for compatibility with SageMaker Notebooks.
2.0.15
2.0.14
Llama3.1-70bconfig files and more.- Support for
fmbench-orchestrator.
2.0.13
- Update
pricing.ymladditional config files.
2.0.11
Llama3.2-1bandLlama3.2-3bsupport on EC2 g5.Llama3-8bon EC2g6einstances.
2.0.9
- Triton-djl support for AWS Chips.
- Tokenizer files are now downloaded directly from Hugging Face (unless provided manually as before)
2.0.8
- Support Triton-TensorRT for GPU instances and Triton-vllm for AWS Chips.
- Misc. bug fixes.
2.0.6
- Run multiple model copies with the DJL serving container and an Nginx load balancer on Amazon EC2.
- Config files for
Llama3.1-8bong5,p4deandp5Amazon EC2 instance types. - Better analytics for creating internal leaderboards.
2.0.5
- Support for Intel CPU based instances such as
c5.18xlargeandm5.16xlarge.
2.0.4
- Support for AMD CPU based instances such as
m7a.
2.0.3
- Support for a EFS directory for benchmarking on EC2.
2.0.2
- Code cleanup, minor bug fixes and report improvements.
2.0.0
- ๐จ Model evaluations done by a Panel of LLM Evaluators[1] ๐จ
v1.0.52
- Compile for AWS Chips (Trainium, Inferentia) and deploy to SageMaker directly through
FMBench. Llama3.1-8bandLlama3.1-70bconfig files for AWS Chips (Trainium, Inferentia).- Misc. bug fixes.
v1.0.51
FMBenchhas a website now. Rework the README file to make it lightweight.Llama3.1config files for Bedrock.
v1.0.50
Llama3-8bon Amazon EC2inf2.48xlargeconfig file.- Update to new version of DJL LMI (0.28.0).
v1.0.49
- Streaming support for Amazon SageMaker and Amazon Bedrock.
- Per-token latency metrics such as time to first token (TTFT) and mean time per-output token (TPOT).
- Misc. bug fixes.
v1.0.48
- Faster result file download at the end of a test run.
Phi-3-mini-4k-instructconfiguration file.- Tokenizer and misc. bug fixes.
v1.0.47
- Run
FMBenchas a Docker container. - Bug fixes for GovCloud support.
- Updated README for EKS cluster creation.
v1.0.46
- Native model deployment support for EC2 and EKS (i.e. you can now deploy and benchmark models on EC2 and EKS).
- FMBench is now available in GovCloud.
- Update to latest version of several packages.
v1.0.45
- Analytics for results across multiple runs.
Llama3-70bconfig files forg5.48xlargeinstances.
v1.0.44
- Endpoint metrics (CPU/GPU utilization, memory utiliztion, model latency) and invocation metrics (including errors) for SageMaker Endpoints.
Llama3-8bconfig files forg6instances.
v1.0.42
- Config file for running
Llama3-8bon all instance types exceptp5. - Fix bug with business summary chart.
- Fix bug with deploying model using a DJL DeepSpeed container in the no S3 dependency mode.
v1.0.40
- Make it easy to run in the Amazon EC2 without any dependency on Amazon S3 dependency mode.
v1.0.39
- Add an internal
FMBenchwebsite.
v1.0.38
- Support for running
FMBenchon Amazon EC2 without any dependency on Amazon S3. Llama3-8b-Instructconfig file forml.p5.48xlarge.
v1.0.37
g5/p4d/inf2/trn1specific config files forLlama3-8b-Instruct.p4dconfig file for bothvllmandlmi-dist.
v1.0.36
- Fix bug at higher concurrency levels (20 and above).
- Support for instance count > 1.
v1.0.35
- Support for Open-Orca dataset and corresponding prompts for Llama3, Llama2 and Mistral.
v1.0.34
- Don't delete endpoints for the bring your own endpoint case.
- Fix bug with business summary chart.
v1.0.32
-
Report enhancements: New business summary chart, config file embedded in the report, version numbering and others.
-
Additional config files: Meta Llama3 on Inf2, Mistral instruct with
lmi-distonp4dandp5instances.
2.0.8
- Support Triton-TensorRT for GPU instances and Triton-vllm for AWS Chips.
- Misc. bug fixes.