1 Introduction

July 18, 2023 · View on GitHub

Understanding the microarchitectural resource characteristics of datacenter jobs has become increasingly critical to guarantee the performance of jobs while improving resource utilization. We provide a new open trace, AMTrace (Alibaba Microarchitecture Trace ), which is profiled from 8,577 high-end physical hosts from Alibaba’s datacenter by a hardware/software co-design monitoring method. AMTrace provides the microarchitectural metrics of 9.8 × 10510^{5} Linux containers with "per-container-per-logic CPU" granularity. AMTrace provides a new perspective to analyze the microarchitectural resource characteristics of datacenter jobs.

Table of Contents

2 System Architecture

sys-overview

Most LC jobs in Alibaba are Java-based e-commerce trading applications. They are containerized and scheduled by Kubernetes. Batch jobs are data processing batch jobs, such as Map-Reduce jobs, Spark jobs, and machine-learning training jobs. They are scheduled by Fuxi. To improve resource utilization, Alibaba uses a colocation architecture (shown in Figure 1), which can colocate LC jobs and Batch jobs on the same host and adopt several resource management technologies to mitigate resource contention.

To reduce CPU contention, we divide all logic CPUs into three groups: Batch CPUSet Pool, LC CPUSet Pool, and LC CPUShare Pool. The sizes of the three pools change dynamically with job scheduling. Jobs are scheduled to the corresponding groups according to their priority. The largest resource consumer, CPUShare Batch jobs, are allowed to utilize the resources in both LC CPUSet Pool and LC CPUShare Pool, but their resource priority is the lowest.

Other adopted resource isolation technologies, including CAT, BVT scheduling, and noise clean, are shown in our paper.

3 Tables

core_pmu_metrics

ColumnsDescriptionTypeExample Entry
tsTimestamp, the number of seconds from the startLong46309
node_idID of the nodeStringe7a66b9189940e9d6102
container_idID of the containerString213aff2b3dbec1f7c212
cpulogic cpu idInt23
core_idcore idInt23
socket_idsocket idInt0
instructionsnumber of instructions happened on the logic cpuLong694470820
cyclesnumber of cycles happened on the logic cpuLong1391935375
ref_cyclesnumber of reference cycles happened on the logic cpuLong1284610800
llc_missesnumber of LLC cache misses happened on the logic cpuLong6667696

uncore_pmu_metrics

ColumnsDescriptionTypeExample Entry
tsTimestamp, the number of seconds from the startLong2787
node_idID of the nodeString68451b8967ad23a10681
socket_idsocket idInt0
channel_idID of the memory channelString4
read_bwread bandwidth (MiB/s)Double2540.7185
write_bwwrite bandwidth (MiB/s)Double1493.8240
latencymemory read latency (ns)Double16.7503

container_meta

ColumnsDescriptionTypeExample Entry
node_idID of the nodeStringdd070834956a25c0c531
container_idID of the containerStringa40de288d9455ba121db
pod_idID of the PodStringbf74c065cd443d178474
cpu_modecpu allocation modeStringCPUShare
app_namethe name of applicaton,Batch jobs are all "Batch"StringBatch
deploy_groupDeployment group, one application may have multiple deployment groupStringBatch
container_typecontainer typeStringBatch-ops
container_cpu_specDeprecated Ratio of logic CPUs requested by the container to the number of logic CPUs of the node (0-1)Double0.04
container_mem_specDeprecated Ratio of memory requested by the container to the memory of the node (0-1)Double0.003
pod_cpu_specRatio of logic CPUs requested by the Pod to the number of logic CPUs of the node (0-1)Double0.04
pod_mem_specRatio of memory requested by the Pod to the memory of the node (0-1)Double0.003

cpu_mode is one of the following enumeration strings:

  • CPUSet: CPUSet means the job will be pinned to several logic CPUs and will not share the logic CPUs with other LC jobs.
  • CPUShare: CPUShare means the job will be scheduled among dozens of logic CPUs but share the logic CPUs with other LC jobs.

container_type is one of the following enumeration stirngs:

  • Batch-Set: Batch job that running in CPUSet mode.
  • Batch-Share:Batch job that running in CPUShare mode.
  • Batch-ops: The operations container of Batch jobs.
  • biz: The business container of LC jobs.
  • other: Some sidecar containers.
  • sys: System containers.

host_meta

ColumnsDescriptionTypeExample Entry
node_idID of the nodeString3201ea36ad240dc51d6c
cpu_numThe number of logic CPUs of the nodeInt96
machine_modelThe model of the node, one model is one kind of specificationStringmachine-1
cpu_modelThe model of the CPUStringskylake
ref_freq_GhzThe frequency of the CPU(Ghz)Int2.5
dimms_per_channelThe number of DIMMs of every memory channelInt2

4 Data

Consindering that the raw trace is pretty large (8,500 host data, more than 2TB), in this version, we sampled 1000-hosts data from the raw trace. The sampled trace data has the same distribution with the raw trace. The size of the sampled trace is 300 GB.

Before the downloading, please make sure that your disk have more than 300 GB available space.

Then, you can run the get_data.sh to download the data and use gzip -d file_name to unzip the files.

In the future version, we will find a lower-cost way to store/distribute the whole data.

5 Analysis Scripts

Coming soon.