PresentMon Capture Application

June 6, 2025 ยท View on GitHub

The PresentMon Capture Application is both an trace capture and realtime performance overlay for games and other graphics-intensive applications. It uses the PresentMon Service to collect performance data, a custom Direct3D 11 renderer to display a realtime performance overlay, and a CEF-based UI to configure overlay and trace capture functionality.

Architecture

Usage

To run the PresentMon Capture Application, run PresentMon.exe.

Only severe errors are logged by default in Release configuration. To log all errors in release:

--p2c-verbose

Logs and cache files are written to %AppData%\PresentMon2Capture by default. You can change this to the working directory of the application (convenient when launching Debug build from IDE):

--p2c-files-working

Enable experimental support for tearing presents (required for Variable Refresh Rate):

--p2c-allow-tearing

In Debug configuration, the application will halt with a modal error dialog whenever a resource is requested from a non-local (network) URL. This flag disables that behavior:

--p2c-no-net-fail

Metric and CSV Column Definitions

MetricCSV ColumnDescriptionCompatible Query Types
ApplicationApplicationName of the executable of the process being targetedDS
ProcessIDID of the process being targeted
Swap Chain AddressSwapChainAddressAddress of the swap chain used to present, useful as a unique identifierF
GPU VendorVendor name of the GPUDS
GPU NameDevice name of the GPUDS
CPU VendorVendor name of the CPUDS
CPU NameDevice name of the CPUDS
CPU Start TimeCPUStartTimeTime elapsed since the start of ETW event tracingF
Frame TimeFrameTimeThe total amount of time in between frames on the CPUDF
CPU BusyCPUBusyHow long the CPU was generating the frame in millisecondsDF
CPU WaitCPUWaitHow long the CPU spent waiting before it could start generating the frame in millisecondsDF
Displayed FPSRate of frame change measurable at displayD
Presented FPSRate of application calls to a Present() functionD
GPU TimeGPUTimeTotal amount of time between when GPU started frame and when it finished in milliseconds. The GPU may not have been fully busy during this timeDF
GPU BusyGPUBusyHow long the GPU spent working on this frameDF
GPU WaitGPUWaitHow long the GPU spent waiting while working on this frameDF
Dropped FramesIndicates if the frame was not displayedDF
Displayed TimeDisplayedTimeHow long this frame was displayed on screenDF
Animation ErrorMsAnimationErrorThe difference between the previous frame's CPU delta and display deltaDF
Animation TimeAnimationTimeThe difference between the previous frame's CPU delta and display deltaF
Sync IntervalSyncIntervalThe application's requested interval between presents measured in vertical sync/vblank eventsDF
Present FlagsPresentFlagsFlags used to configure the present operationDF
Present ModePresentModeMethod used to present the frameDF
Present RuntimePresentRuntimeThe graphics runtime used for the present operation (DXGI, D3D9, etc.)DF
Allows TearingAllowsTearingIndicates if the frame allows tearingDF
Frame TypeFrameTypeWhether the frame was rendered by the application or interpolated by a driver/SDK.DF
GPU LatencyGPULatencyHow long it took until GPU work for this frame startedDF
Display LatencyDisplayLatencyTime between frame submission and scan out to displayDF
Click To Photon LatencyMsClickToPhotonLatencyTime between mouse click input and displayDF
GPU Sustained Power LimitSustained power limit of the GPUDS
GPU PowerGPUPowerPower consumed by the graphics adapterDF
GPU VoltageGPUVoltageVoltage consumed by the graphics adapterDF
GPU FrequencyGPUFrequencyClock speed of the GPU coresDF
GPU TemperatureGPUTemperatureTemperature of the GPUDF
GPU Fan SpeedGPUFanSpeedRate at which a GPU cooler fan is rotatingDF
GPU UtilizationGPUUtilizationAmount of GPU processing capacity being usedDF
3D/Compute Utilization3D/ComputeUtilizationAmount of 3D/Compute processing capacity being usedDF
Media UtilizationMediaUtilizationAmount of media processing capacity being usedDF
GPU Power LimitedGPUPowerLimitedGPU frequency is being limited because GPU is exceeding maximum power limitsDF
GPU Temperature LimitedGPUTemperatureLimitedGPU frequency is being limited because GPU is exceeding maximum temperature limitsDF
GPU Current LimitedGPUCurrentLimitedGPU frequency is being limited because GPU is exceeding maximum current limitsDF
GPU Voltage LimitedGPUVoltageLimitedGPU frequency is being limited because GPU is exceeding maximum voltage limitsDF
GPU Utilization LimitedGPUUtilizationLimitedGPU frequency is being limited due to low GPU utilizationDF
GPU Memory PowerGPUMemoryPowerPower consumed by the GPU memoryDF
GPU Memory VoltageGPUMemoryVoltageVoltage consumed by the GPU memoryDF
GPU Memory FrequencyGPUMemoryFrequencyClock speed of the GPU memoryDF
GPU Memory Effective FrequencyGPUMemoryEffectiveFrequencyEffective data transfer rate GPU memory can sustainDF
GPU Memory TemperatureGPUMemoryTemperatureTemperature of the GPU memoryDF
GPU Memory SizeGPUMemorySizeSize of the GPU memoryDS
GPU Memory Size UsedGPUMemorySizeUsedAmount of used GPU memoryDF
GPU Memory UtilizationPercent of GPU memory usedD
GPU Memory Max BandwidthGPUMemoryMaxBandwidthMaximum total GPU memory bandwidthDS
GPU Memory Write BandwidthGPUMemoryWriteBandwidthMaximum GPU memory bandwidth for writingDF
GPU Memory Read BandwidthGPUMemoryReadBandwidthMaximum GPU memory bandwidth for readingDF
GPU Memory Power LimitedGPUMemoryPowerLimitedMemory frequency is being limited because the memory modules are exceeding the maximum power limitsDF
GPU Memory Temperature LimitedGPUMemoryTemperatureLimitedMemory frequency is being limited because the memory modules are exceeding the maximum temperature limitsDF
GPU Memory Current LimitedGPUMemoryCurrentLimitedMemory frequency is being limited because the memory modules are exceeding the maximum current limitsDF
GPU Memory Voltage LimitedGPUMemoryVoltageLimitedMemory frequency is being limited because the memory modules are exceeding the maximum voltage limitsDF
GPU Memory Utilization LimitedGPUMemoryUtilizationLimitedMemory frequency is being limited due to low memory trafficDF
CPU UtilizationCPUUtilizationAmount of CPU processing capacity being usedDF
CPU Power LimitPower limit of the CPUDS
CPU PowerCPUPowerPower consumed by the CPUDF
CPU TemperatureCPUTemperatureTemperature of the CPUDF
CPU FrequencyCPUFrequencyClock speed of the CPUDF
CPU Core UtilityAmount of CPU processing utility being used per coreD
All Input To Photon LatencyMsAllInputToPhotonLatencyTime between any input and displayDF
Instrumented LatencyInstrumentedLatencyHow long it took from the instrumented start of this frame until the frame was displayed on the screenDF
GPU Effective FrequencyEffective clock speed of the GPU coresDF
GPU Voltage Regulator TemperatureVoltage regulator temperatureDF
GPU Memory Effective BandwidthData transfer rate that the memory modules can sustain based on current clock frequencyDF
GPU Overvoltage PercentGPU overvoltage increment as a ratio of the maximum incrementDF
GPU Temperature PercentGPU temperature as a ratio of the thermal marginDF
GPU Power PercentGPU power draw as a ratio of default maximum powerDF
GPU Fan Speed PercentGPU fan speed as a ratio of the max speed for that fanDF
GPU Card Power,Total power consumption of the graphics adapter boardDF
Present Start TimeTimeInSecondsThe time the Present() call was made, in secondsF
Present Start QPCPresent Start QPCThe time the Present() call was made as a QueryPerformanceCounter() valueF
Between PresentsMsBetweenPresentsThe time between this Present() call and the previous one, in millisecondsF
In Present APIMsInPresentAPIThe time spent inside the Present() call, in milliseconds.F
Between Display ChangeMsBetweenDisplayChangeHow long the previous frame was displayed before this Present() was displayed, in millisecondsDF
Until DisplayedMsUntilDisplayedThe time between the Present() call and when the frame was displayed, in millisecondsDF
Render Present LatencyMsRenderPresentLatencyThe time between the Present() call and when GPU work for this frame completed, in millisecondsF
Between Simulation StartMsBetweenSimulationStartThe time between the start of simulation processing of the previous frame and this one, in milliseconds.F
PC LatencyMsPCLatencyTime between PC receiving input and frame being sent to the display, in millisecondsDF
Displayed Frame TimeThe time between when the previous frame was displayed and this frame was, in millisecondsD
Presented Frame TimeThe time between this Present call and the previous one, in millisecondsD
Between App StartMsBetweenAppStartHow long it took from the start of this frame until the CPU started working on the next frame, in milliseconds.F

*Query Type Codes: D = Dynamic Query, F = Frame Event Query, S = Static Query

Comma-separated value (CSV) file output

CSV file names

The PresentMon capture application creates two CSV files per capture. The first records the raw frame data of the capture and is named using the following pattern: "pmcap-[executablename]-YYMMDD-HHMMSS.csv". The second CSV file generated is a stats summary file for the capture. It includes the duration of the capture, the total number of frames captured, plus the average, minimum, maximum, 99th, 95th and 90th FPS percentiles. The stats file is named using the following pattern: "pmcap-[executablename]-YYMMDD-HHMMSS-stats.csv". All files are stored in the user's appdata local directory in the "Intel\PresentMon\Capture" folder.

Implementation

Z-band

Z-bands

Windows has a concept of Z-bands. These add a hierarchical layer to the idea of Z-order, such that all windows in a higher Z-band will always appear on top of windows in any lower Z-bands. By default, user application windows are created in the lowest Z-band (ZBID_DESKTOP), and certain OS elements such as the Start Menu or Xbox Game Bar exist on higher Z-bands.

There exists an undocumented WinAPI function called CreateWindowInBand that allows an application to create a window in a Z-band above the default one. When this function is called, the OS will perform a check to make sure the application has the required privileges. We give the app these privileges by setting uiAccess=true in the app manifest.

Motivation

Our motivation to use CreateWindowInBand is to ensure that the performance monitoring overlay appears above the target game application, even when running in fullscreen exclusive mode.

uiAccess

MSDN:uiAccess

uiAccess is an option that is set in an executable's manifest. It enables bypassing UI restrictions and is meant mainly for accessibility applications such as IMEs that need to appear above the active application.

This ability to bypass UI restrictions means that certain precaution are taken with respect to uiAccess applications:

  • The application must be cryptographically signed to protect against tampering
  • The application must be run from a trusted location (such as "Program Files")

Issues

  • There seems to be problems with spawning a uiAccess process from another (non-admin) process.
  • There might be problems when a normal (non-admin) process tries to Send/PostMessage to a uiAccess process

uiAccess Application Special Abilities / Vulnerabilities

  • Set the foreground window.
  • Drive any application window by using the SendInput function.
  • Use read input for all integrity levels by using low-level hooks, raw input, - GetKeyState, GetAsyncKeyState, and GetKeyboardInput.
  • Set journal hooks.
  • Use AttachThreadInput to attach a thread to a higher integrity input queue.

Observations

We have noted that an application can remain on top (even above fullscreen exclusive games) when uiAccess is set to true, even when CreateWindowInBand is not used. This also seems to be reported elsewhere: https://www.autohotkey.com/boards/viewtopic.php?t=75695.

MSDN:Integrity Levels

CEF

https://bitbucket.org/chromiumembedded/cef/wiki/Home

The PresentMon Capture Application uses the Chromium Embedded Framework (CEF) to implement the the control UI. The CEF is a C++ framework that streamlines development of custom applications with Chromium. With some minimal bootstrapping and configuring code, the framework will spin up and connect Chromium components, binding them to windows, inputs, sockets, etc. on the platform of choice.

Behavior of the framework can be customized by inheriting from base class interfaces and injecting them into the framework, thus hooking various callback functions to implement your desired behavior. In particular, custom objects can be implemented in C++ and then injected into the global (window) namespace in V8 to create an interop between JS and C++ code.

A major challenge when dealing with CEF is the multi-process nature of Chromium. One must be aware at all time on which process and which thread each piece of code is running on. Thread task queues and IPC message queues are used to make sure that operations are executed on the appropriate thread and process. V8 contexts must also be captured and managed when interacting with V8 state.