Azure VM Noise Dataset 2024

February 24, 2025 · View on GitHub

-- revision 1, 20250202

Introduction

This directory contains a set of files representing a collection of benchmarks run on Microsoft Azure Virtual Machine offerings (D8s_v5 and B8ms) over a period of around 483 days from 2023-05-28 to 2024-09-23. We used SSDv2 Disks as the "remote disk", and a Premium SSDs as the "local disk" in the tests. This dataset is the data that is described in, and analyzed in the EuroSys 2025 paper TUNA: Tuning Unstable and Noisy Cloud Applications. A set of benchmarks were used to attempt to cover the main components in the VM, with the exception of the network: Cache, CPU, Disk, Memory, OS. Additionally, two end to end applications were also benchmarked: PostgreSQL and Redis.

Main characteristics

Dataset size: 277 MB
Number of files: 1520 files
Duration: 483 days
Average Benchmark Suite Duration: 130 minutes
Long Lived VM Characteristics:
- Metrics Collected: 3,661,602
- Total VMs: 12
Short Lived VM Characteristics:
- Metrics Collected: 3,375,618
- Total VMs: 43617
- Time between VM instantiations: 40 minutes

Using the Data

License

The data is made available and licensed under a CC-BY Attribution License. By downloading it or using them, you agree to the terms of this license.

Attribution

If you use this data for a publication or project, please cite the accompanying paper:

Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, and Shivaram Venkataraman. 2025.
TUNA: Tuning Unstable and Noisy Cloud Applications.
In Proceedings of the Twentieth European Conference on Computer Systems (EuroSys '25).
Association for Computing Machinery, New York, NY, USA

@inproceedings {TUNA,
  author = {Johannes Freischuetz and Konstantinos Kanellis and Brian Kroth, and Shivaram Venkataraman},
  title = {TUNA: Tuning Unstable and Noisy Cloud Applications},
  booktitle = {EuroSys '25: Proceedings of the Nineteenth European Conference on Computer Systems},
  publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, year={2025}, month = {mar}
}

We provide the traces as they are, but are willing to help researchers understand and use them. So, please let us know of any issues or questions by sending email to our mailing list.

Downloading

You can download the dataset here: vm-noise-data.

Schema and Description

Layout

1 file per measurement unit, partitioned using Hive style table partitioning layout:
```
test_suite=<test_suite>/test_name=<test_name>/vm_lifespan=(short|long)/vm_region=(eastus|westus2)/vm_sku=(B8ms|D8s_v5)/unit=<unit>.csv
```
Where test_suite and test_name can be taken from the table in the benchmarks section below.

Schema

value	runtime	starttime	VM_id
measured value	duration of test (in seconds)	starting datetime of test	VM id (unique within dimension)

Example

value	runtime	starttime	VM_id
5095.0	33.26	2023-06-23 16:06:09.190	0
5095.0	33.23	2023-06-23 17:52:53.550	0
5098.0	87.77	2023-09-05 12:13:20.380	1
5098.0	88.12	2023-09-05 22:26:45.210	1
5096.0	33.23	2024-05-20 00:43:05.260	2
5101.0	33.22	2024-05-20 01:44:24.070	2

Sample Code

Some sample code for using this data in a notebook can be found in vm-noise-data/sample.ipynb

Description

This benchmarking data was collected from 2023-05-28 to 2024-09-23 for a set of VMs and organized using the hive partitioning layout.

There are a series of 92 metrics collected from a series of 40 benchmarks.

These metrics were collected from VMs in 3 dimensions:

VM lifespan

For VM lifespan, we categorized VMs into two classes: short and long.

Long running VMs were provisioned once and ran for the entire duration of the study.

Short running VMs were only ran each benchmark one time before being reallocated.

The purpose of this dimension is to influence which backend host the VM was assigned to in order to increase samples across different backend hosts. While we omit host information, short lived VMs were mostly place on distinct hosts, and long lived VMs had almost no migrations.
VM SKU

For VM SKUs, we chose D8s_v5 VMs and B8ms VMs.
VM region

For VM regions, we chose westus2 and eastus

There were three VMs allocated for each combination of VM dimensions.

Note: There are some periods of missing data caused by crashes on our management nodes.

Benchmarks

The benchmarks used came from the following suites:

Suite	Benchmarks	Description
Flexible IO Tester	(Random Read) Random Write Sequential Read Sequential Write	Test the throughput in MiB/s and IOPS, and the latency of various disk operations
Intel Memory Latency Checker	Idle Latency Max Bandwidth and Peak Injeciton Bandwidth: - All reads - 1:1 read - write ratio - 2:1 read - write ratio - 3:1 read - write ratio - stream-triad like	Test throughput of various memory operations
OS Bench	Create Files Create Processes Create Threads Launch Programs Memory Allocations	Measure latency for various OS related operations
perf-bench	Epoll Wait Memcpy Memset Syscall Basic	Measure other OS related operations
PostgreSQL	All combinations of the following: Scaling Factor: 25 / 2500 Client: 1 / 25 Mode: Read Only / Read Write from pgbench	Measure various workload combinations using pgbench
Redis	redis-benchmark tests for the following: - GET - LPOP - LPUSH - SADD - SET	Benchmark various redis operations using redis-benchmark
stress-ng	CPU_Cache CPU Stress Matrix Math Memory Copy	Benchmark the CPU, and one memory benchmark
Sysbench	CPU RAM Memory	Benchmark CPU and Benchmark

During each iteration, the full set of benchmarks were run in a random order with a random splay between each benchmark.