Azure VM Noise Dataset 2024
February 24, 2025 ยท View on GitHub
-- revision 1, 20250202
Introduction
This directory contains a set of files representing a collection of benchmarks run on Microsoft Azure Virtual Machine offerings (D8s_v5 and B8ms) over a period of around 483 days from 2023-05-28 to 2024-09-23.
We used SSDv2 Disks as the "remote disk", and a Premium SSDs as the "local disk" in the tests.
This dataset is the data that is described in, and analyzed in the EuroSys 2025 paper TUNA: Tuning Unstable and Noisy Cloud Applications.
A set of benchmarks were used to attempt to cover the main components in the VM, with the exception of the network: Cache, CPU, Disk, Memory, OS.
Additionally, two end to end applications were also benchmarked: PostgreSQL and Redis.
Main characteristics
- Dataset size: 277 MB
- Number of files: 1520 files
- Duration: 483 days
- Average Benchmark Suite Duration: 130 minutes
- Long Lived VM Characteristics:
- Metrics Collected: 3,661,602
- Total VMs: 12
- Short Lived VM Characteristics:
- Metrics Collected: 3,375,618
- Total VMs: 43617
- Time between VM instantiations: 40 minutes
Using the Data
License
The data is made available and licensed under a CC-BY Attribution License. By downloading it or using them, you agree to the terms of this license.
Attribution
If you use this data for a publication or project, please cite the accompanying paper:
Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, and Shivaram Venkataraman. 2025.
TUNA: Tuning Unstable and Noisy Cloud Applications.
In Proceedings of the Twentieth European Conference on Computer Systems (EuroSys '25).
Association for Computing Machinery, New York, NY, USA
@inproceedings {TUNA,
author = {Johannes Freischuetz and Konstantinos Kanellis and Brian Kroth, and Shivaram Venkataraman},
title = {TUNA: Tuning Unstable and Noisy Cloud Applications},
booktitle = {EuroSys '25: Proceedings of the Nineteenth European Conference on Computer Systems},
publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, year={2025}, month = {mar}
}
We provide the traces as they are, but are willing to help researchers understand and use them. So, please let us know of any issues or questions by sending email to our mailing list.
Downloading
You can download the dataset here: vm-noise-data.
Schema and Description
Layout
-
1 file per measurement unit, partitioned using Hive style table partitioning layout:
test_suite=<test_suite>/test_name=<test_name>/vm_lifespan=(short|long)/vm_region=(eastus|westus2)/vm_sku=(B8ms|D8s_v5)/unit=<unit>.csvWhere
test_suiteandtest_namecan be taken from the table in the benchmarks section below.
Schema
| value | runtime | starttime | VM_id |
|---|---|---|---|
| measured value | duration of test (in seconds) | starting datetime of test | VM id (unique within dimension) |
Example
| value | runtime | starttime | VM_id |
|---|---|---|---|
| 5095.0 | 33.26 | 2023-06-23 16:06:09.190 | 0 |
| 5095.0 | 33.23 | 2023-06-23 17:52:53.550 | 0 |
| 5098.0 | 87.77 | 2023-09-05 12:13:20.380 | 1 |
| 5098.0 | 88.12 | 2023-09-05 22:26:45.210 | 1 |
| 5096.0 | 33.23 | 2024-05-20 00:43:05.260 | 2 |
| 5101.0 | 33.22 | 2024-05-20 01:44:24.070 | 2 |
Sample Code
Some sample code for using this data in a notebook can be found in vm-noise-data/sample.ipynb
Description
This benchmarking data was collected from 2023-05-28 to 2024-09-23 for a set of VMs and organized using the hive partitioning layout.
There are a series of 92 metrics collected from a series of 40 benchmarks.
These metrics were collected from VMs in 3 dimensions:
-
VM lifespan
For VM lifespan, we categorized VMs into two classes:
shortandlong.Long running VMs were provisioned once and ran for the entire duration of the study.
Short running VMs were only ran each benchmark one time before being reallocated.
The purpose of this dimension is to influence which backend host the VM was assigned to in order to increase samples across different backend hosts. While we omit host information, short lived VMs were mostly place on distinct hosts, and long lived VMs had almost no migrations.
-
VM SKU
-
VM region
There were three VMs allocated for each combination of VM dimensions.
Note: There are some periods of missing data caused by crashes on our management nodes.
Benchmarks
The benchmarks used came from the following suites:
| Suite | Benchmarks | Description |
|---|---|---|
| Flexible IO Tester | (Random Read) Random Write Sequential Read Sequential Write | Test the throughput in MiB/s and IOPS, and the latency of various disk operations |
| Intel Memory Latency Checker | Idle Latency Max Bandwidth and Peak Injeciton Bandwidth: - All reads - 1:1 read - write ratio - 2:1 read - write ratio - 3:1 read - write ratio - stream-triad like | Test throughput of various memory operations |
| OS Bench | Create Files Create Processes Create Threads Launch Programs Memory Allocations | Measure latency for various OS related operations |
| perf-bench | Epoll Wait Memcpy Memset Syscall Basic | Measure other OS related operations |
| PostgreSQL | All combinations of the following: Scaling Factor: 25 / 2500 Client: 1 / 25 Mode: Read Only / Read Write from pgbench | Measure various workload combinations using pgbench |
| Redis | redis-benchmark tests for the following: - GET - LPOP - LPUSH - SADD - SET | Benchmark various redis operations using redis-benchmark |
| stress-ng | CPU_Cache CPU Stress Matrix Math Memory Copy | Benchmark the CPU, and one memory benchmark |
| Sysbench | CPU RAM Memory | Benchmark CPU and Benchmark |
During each iteration, the full set of benchmarks were run in a random order with a random splay between each benchmark.
See Also
- https://aka.ms/mlos/tuna-eurosys-artifacts - The artifacts for the EuroSys 2025 paper 'TUNA: Tuning Unstable and Noisy Cloud Applications'