Azure VM Noise Dataset 2024

February 24, 2025 ยท View on GitHub

-- revision 1, 20250202

Introduction

This directory contains a set of files representing a collection of benchmarks run on Microsoft Azure Virtual Machine offerings (D8s_v5 and B8ms) over a period of around 483 days from 2023-05-28 to 2024-09-23. We used SSDv2 Disks as the "remote disk", and a Premium SSDs as the "local disk" in the tests. This dataset is the data that is described in, and analyzed in the EuroSys 2025 paper TUNA: Tuning Unstable and Noisy Cloud Applications. A set of benchmarks were used to attempt to cover the main components in the VM, with the exception of the network: Cache, CPU, Disk, Memory, OS. Additionally, two end to end applications were also benchmarked: PostgreSQL and Redis.

Main characteristics

  • Dataset size: 277 MB
  • Number of files: 1520 files
  • Duration: 483 days
  • Average Benchmark Suite Duration: 130 minutes
  • Long Lived VM Characteristics:
    • Metrics Collected: 3,661,602
    • Total VMs: 12
  • Short Lived VM Characteristics:
    • Metrics Collected: 3,375,618
    • Total VMs: 43617
    • Time between VM instantiations: 40 minutes

Using the Data

License

The data is made available and licensed under a CC-BY Attribution License. By downloading it or using them, you agree to the terms of this license.

Attribution

If you use this data for a publication or project, please cite the accompanying paper:

Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, and Shivaram Venkataraman. 2025.
TUNA: Tuning Unstable and Noisy Cloud Applications.
In Proceedings of the Twentieth European Conference on Computer Systems (EuroSys '25).
Association for Computing Machinery, New York, NY, USA

@inproceedings {TUNA,
  author = {Johannes Freischuetz and Konstantinos Kanellis and Brian Kroth, and Shivaram Venkataraman},
  title = {TUNA: Tuning Unstable and Noisy Cloud Applications},
  booktitle = {EuroSys '25: Proceedings of the Nineteenth European Conference on Computer Systems},
  publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, year={2025}, month = {mar}
}

We provide the traces as they are, but are willing to help researchers understand and use them. So, please let us know of any issues or questions by sending email to our mailing list.

Downloading

You can download the dataset here: vm-noise-data.

Schema and Description

Layout

  • 1 file per measurement unit, partitioned using Hive style table partitioning layout:

    test_suite=<test_suite>/test_name=<test_name>/vm_lifespan=(short|long)/vm_region=(eastus|westus2)/vm_sku=(B8ms|D8s_v5)/unit=<unit>.csv
    

    Where test_suite and test_name can be taken from the table in the benchmarks section below.

Schema

valueruntimestarttimeVM_id
measured valueduration of test (in seconds)starting datetime of testVM id (unique within dimension)

Example

valueruntimestarttimeVM_id
5095.033.262023-06-23 16:06:09.1900
5095.033.232023-06-23 17:52:53.5500
5098.087.772023-09-05 12:13:20.3801
5098.088.122023-09-05 22:26:45.2101
5096.033.232024-05-20 00:43:05.2602
5101.033.222024-05-20 01:44:24.0702

Sample Code

Some sample code for using this data in a notebook can be found in vm-noise-data/sample.ipynb

Description

This benchmarking data was collected from 2023-05-28 to 2024-09-23 for a set of VMs and organized using the hive partitioning layout.

There are a series of 92 metrics collected from a series of 40 benchmarks.

These metrics were collected from VMs in 3 dimensions:

  1. VM lifespan

    For VM lifespan, we categorized VMs into two classes: short and long.

    Long running VMs were provisioned once and ran for the entire duration of the study.

    Short running VMs were only ran each benchmark one time before being reallocated.

    The purpose of this dimension is to influence which backend host the VM was assigned to in order to increase samples across different backend hosts. While we omit host information, short lived VMs were mostly place on distinct hosts, and long lived VMs had almost no migrations.

  2. VM SKU

    For VM SKUs, we chose D8s_v5 VMs and B8ms VMs.

  3. VM region

    For VM regions, we chose westus2 and eastus

There were three VMs allocated for each combination of VM dimensions.

Note: There are some periods of missing data caused by crashes on our management nodes.

Benchmarks

The benchmarks used came from the following suites:

SuiteBenchmarksDescription
Flexible IO Tester(Random Read)
Random Write
Sequential Read
Sequential Write
Test the throughput in MiB/s and IOPS, and the latency of various disk operations
Intel Memory Latency CheckerIdle Latency Max Bandwidth and Peak Injeciton Bandwidth:
- All reads
- 1:1 read
- write ratio
- 2:1 read
- write ratio
- 3:1 read
- write ratio
- stream-triad like
Test throughput of various memory operations
OS BenchCreate Files
Create Processes
Create Threads
Launch Programs
Memory Allocations
Measure latency for various OS related operations
perf-benchEpoll Wait
Memcpy
Memset
Syscall Basic
Measure other OS related operations
PostgreSQLAll combinations of the following:
Scaling Factor: 25 / 2500
Client: 1 / 25
Mode: Read Only / Read Write from pgbench
Measure various workload combinations using pgbench
Redisredis-benchmark tests for the following:
- GET
- LPOP
- LPUSH
- SADD
- SET
Benchmark various redis operations using redis-benchmark
stress-ngCPU_Cache
CPU Stress
Matrix Math
Memory Copy
Benchmark the CPU, and one memory benchmark
SysbenchCPU
RAM Memory
Benchmark CPU and Benchmark

During each iteration, the full set of benchmarks were run in a random order with a random splay between each benchmark.

See Also