Profiling, in this document, means monitoring the execution of a program which is executing on the Common Language Runtime (CLR). This document details the interfaces, provided by the Runtime, to access such information.
Although it is called the Profiling API, the functionality provided by it is suitable for use by more than just traditional profiling tools. Traditional profiling tools focus on measuring the execution of the program—time spent in each function, or memory usage of the program over time. However, the profiling API is really targeted at a broader class of diagnostic tools, such as code-coverage utilities or even advanced debugging aids.
The common thread among all of these uses is that they are all diagnostic in nature — the tool is written to monitor the execution of a program. The Profiling API should never be used by the program itself, and the correctness of the program's execution should not depend on (or be affected by) having a profiler active against it.
Profiling a CLR program requires more support than profiling conventionally compiled machine code. This is because the CLR has concepts such as application domains, garbage collection, managed exception handling and JIT compilation of code (converting Intermediate Language into native machine code), that the existing conventional profiling mechanisms are unable to identify and provide useful information. The Profiling API provides this missing information in an efficient way that causes minimal impact on the performance of the CLR and the profiled program.
Note that JIT-compiling routines at runtime provide good opportunities, as the API allows a profiler to change the in-memory IL code stream for a routine, and then request that it be JIT-compiled anew. In this way, the profiler can dynamically add instrumentation code to particular routines that need deeper investigation. Although this approach is possible in conventional scenarios, it's much easier to do this for the CLR.
Expose information that existing profilers will require for a user to determine and analyze performance of a program run on the CLR. Specifically:
Common Language Runtime startup and shutdown events
Application domain creation and shutdown events
Assembly loading and unloading events
Module load/unload events
Com VTable creation and destruction events
JIT-compiles, and code pitching events
Class load/unload events
Thread birth/death/synchronization
Function entry/exit events
Exceptions
Transitions between managed and unmanaged execution
Transitions between different Runtime contexts
Information about Runtime suspensions
Information about the Runtime memory heap and garbage collection activity
Callable from any (non-managed) COM-compatible language
Efficient, in terms of CPU and memory consumption - the act of profiling should not cause such a big change upon the program being profiled that the results are misleading
Useful to both sampling and non-sampling profilers. [A _sampling _profiler inspects the profilee at regular clock ticks - maybe 5 milliseconds apart, say. A _non-sampling _profiler is informed of events, synchronously with the thread that causes them]
The Profiling API does not support profiling unmanaged code. Existing mechanisms must instead be used to profile unmanaged code. The CLR profiling API works only for managed code. However, profiler provides managed/unmanaged transition events to determine the boundaries between managed and unmanaged code.
The Profiling API does not support writing applications that will modify their own code, for purposes such as aspect-oriented programming.
The Profiling API does not provide information needed to check bounds. The CLR provides intrinsic support for bounds checking of all managed code.
The CLR code profiler interfaces do not support remote profiling due to the following reasons:
It is necessary to minimize execution time using these interfaces so that profiling results will not be unduly affected. This is especially true where execution performance is being monitored. However, it is not a limitation when the interfaces are used to monitor memory usage or to obtain Runtime information on stack frames, objects, etc.
The code profiler needs to register one or more callback interfaces with the Runtime on the local machine on which the application being profiled runs. This limits the ability to create a remote code profiler.
The profiling API within CLR allows the user to monitor the execution and memory usage of a running application. Typically, this API will be used to write a code profiler package. In the sections that follow, we will talk about a profiler as a package built to monitor execution of any managed application.
The profiling API is used by a profiler DLL, loaded into the same process as the program being profiled. The profiler DLL implements a callback interface (ICorProfilerCallback2). The runtime calls methods on that interface to notify the profiler of events in the profiled process. The profiler can call back into the runtime with methods on ICorProfilerInfo to get information about the state of the profiled application.
Note that only the data-gathering part of the profiler solution should be running in-process with the profiled application—UI and data analysis should be done in a separate process.
The ICorProfilerCallback and ICorProfilerCallback2 interfaces consists of methods with names like ClassLoadStarted, ClassLoadFinished, JITCompilationStarted. Each time the CLR loads/unloads a class, compiles a function, etc., it calls the corresponding method in the profiler's ICorProfilerCallback/ICorProfilerCallback2 interface. (And similarly for all of the other notifications; see later for details)
So, for example, a profiler could measure code performance via the two notifications FunctionEnter and FunctionLeave. It simply timestamps each notification, accumulates results, then outputs a list indicating which functions consumed the most cpu time, or most wall-clock time, during execution of the application.
The ICorProfilerCallback/ICorProfilerCallback2 interface can be considered to be the "notifications API".
The other interface involved for profiling is ICorProfilerInfo. The profiler calls this, as required, to obtain more information to help its analysis. For example, whenever the CLR calls FunctionEnter it supplies a value for the FunctionId. The profiler can discover more information about that FunctionId by calling the ICorProfilerInfo::GetFunctionInfo to discover the function's parent class, its name, etc, etc.
The picture so far describes what happens once the application and profiler are running. But how are the two connected together when an application is started? The CLR makes the connection during its initialization in each process. It decides whether to connect to a profiler, and which profiler that should be, depending upon the value for two environment variables, checked one after the other:
CORECLR_ENABLE_PROFILING - only connect with a profiler if this environment variable exists and is set to a non-zero value.
CORECLR_PROFILER - connect with the profiler with this CLSID or ProgID (which must have been stored previously in the Registry). The CORECLR_PROFILER environment variable is defined as a string:
set CORECLR_PROFILER={32E2F4DA-1BEA-47ea-88F9-C5DAF691C94A}, or
set CORECLR_PROFILER="MyProfiler"
The profiler class is the one that implements ICorProfilerCallback/ICorProfilerCallback2. It is required that a profiler implement ICorProfilerCallback2; if it does not, it will not be loaded.
When both checks above pass, the CLR creates an instance of the profiler in a similar fashion to CoCreateInstance. The profiler is not loaded through a direct call to CoCreateInstance so that a call to CoInitialize may be avoided, which requires setting the threading model. It then calls the ICorProfilerCallback::Initialize method in the profiler. The signature of this method is:
The profiler must QueryInterface pICorProfilerInfoUnk for an ICorProfilerInfo interface pointer and save it so that it can call for more info during later profiling. It then calls ICorProfilerInfo::SetEventMask to say which categories of notifications it is interested in. For example:
This mask would be used for a profiler interested only in function enter/leave notifications and garbage collection notifications. The profiler then simply returns, and is off and running!
By setting the notifications mask in this way, the profiler can limit which notifications it receives. This obviously helps the user build a simpler, or special-purpose profiler; it also reduces wasted cpu time in sending notifications that the profiler would simply 'drop on the floor' (see later for details).
TODO: This text is a bit confusing. It seems to be conflating the fact that you need to create a different 'environment' (as in environment variables) to specify a different profiler and the fact that only one profiler can attach to a process at once. It may also be conflating launch vs. attach scenarios. Is that right??
Note that only one profiler can be profiling a process at one time in a given environment. In different environments it is possible to have two different profilers registered in each environment, each profiling separate processes.
Certain profiler events are IMMUTABLE which means that once they are set in the ICorProfilerCallback::Initialize callback they cannot be turned off using ICorProfilerInfo::SetEventMask(). Trying to change an immutable event will result in SetEventMask returning a failed HRESULT.
The profiler must be implemented as an inproc COM server – a DLL, which is mapped into the same address space as the process being profiled. Any other type of COM server is not supported; if a profiler, for example, wants to monitor applications from a remote computer, it must implement 'collector agents' on each machine, which batch results and communicate them to the central data collection machine.
Runtime notifications supply an ID for reported classes, threads, AppDomains, etc. These IDs can be used to query the Runtime for more info. These IDs are simply the address of a block in memory that describes the item; however, they should be treated as opaque handles by any profiler. If an invalid ID is used in a call to any Profiling API function then the results are undefined. Most likely, the result will be an access violation. The user has to ensure that the ID's used are perfectly valid. The profiling API does not perform any type of validation since that would create overhead and it would slow down the execution considerably.
ID's are arranged in a hierarchy, mirroring the hierarchy in the process. Processes contain the global AppDomain which contains Assemblies which contain Modules which contain Classes which contain Functions. Threads are contained within Processes. Objects are contained within the AppDomain. Contexts are contained within Processes.