fTimer Semantics Reference
June 9, 2026 · View on GitHub
When to read this: When runtime behavior or contract is changing or unclear. This is the primary current runtime contract document on
main. Do not load this by default for routine coding tasks where the behavior is not in question.
fTimer Semantics Reference
This document describes the current runtime contract on main.
Current main implements stack-based start/stop timing, context-sensitive accounting, strict/warn/repair mismatch handling, lookup, reset, procedural ftimer_scope and OOP ftimer_oop_scope scoped guards, the ierr vs stderr error contract, get_summary(), print_summary(), write_summary(), write_summary_csv(), mpi_summary(), mpi_union_summary() sparse descriptor-union summaries, print_mpi_summary(), write_mpi_summary(), write_mpi_summary_csv(), print_mpi_union_summary(), write_mpi_union_summary(), write_mpi_union_summary_csv(), self-time computation, callback suppression during repair, descriptor-hash MPI preflight, globally meaningful MPI min/avg/max summary fields on every participating rank, limited master-thread-only OpenMP guards in ftimer_core when built with FTIMER_USE_OPENMP=ON, and the explicit ftimer_openmp thread-lane runtime with stopped-run local OpenMP summary, text report, CSV output, strict MPI+OpenMP hybrid summary/report/CSV output, and sparse MPI+OpenMP union summary/report/CSV output for opt-in serial-lane and level-1 OpenMP worker timing. In non-MPI builds, mpi_summary() and mpi_union_summary() return FTIMER_ERR_NOT_IMPLEMENTED with empty MPI summary results; MPI report APIs, including sparse union reports and CSV export, return FTIMER_ERR_NOT_IMPLEMENTED without emitting report output. The ftimer_openmp_t MPI+OpenMP report families return FTIMER_ERR_NOT_IMPLEMENTED for initialized objects in non-MPI builds, after the usual lifecycle checks such as FTIMER_ERR_NOT_INIT for uninitialized objects.
This contract is strongest for disciplined serial and pure-MPI wall-clock timing. OpenMP support has two current paths: existing ftimer/ftimer_core calls keep the master-thread-only carve-out for bracketing a parallel region as a whole, while ftimer_openmp_t provides explicit opt-in serial-lane and level-1 worker timing with local OpenMP summaries plus strict and sparse union MPI+OpenMP rank/lane reductions. Likewise, on_event is a lightweight intra-run hook, not a stable external-profiler integration API.
Current architecture, validation, and workflow notes belong in docs/design.md.
High-risk fault categories and their evidence homes are mapped in
docs/fault-model-traceability.md. Historical
phase-roadmap notes belong in docs/implementation-history.md. When
current-state sources disagree, use this repository-wide precedence order:
current code under src/, then current behavioral tests, then
docs/semantics.md, then README.md, then docs/design.md.
Timing Model
- Inclusive vs exclusive (self) time definitions
- Wall-clock only (no CPU time, no hardware counters)
- Injected clocks are expected to be monotonic within a timing run
Default serial system_clock assumptions
In non-MPI builds, the build-default clock used by type(ftimer_t) and the
procedural wrappers is Fortran system_clock(count, rate) converted to seconds
as real(count, wp) / real(rate, wp).
- fTimer assumes
rate > 0and that the returned count is nondecreasing over a timing run. Ifrateis unavailable, the current default clock stops witherror stop. - fTimer does not independently prove monotonicity, clamp backward movement, or synthesize corrected elapsed time. If the processor clock moves backward, resets, or otherwise returns a smaller count than an earlier call, the resulting negative interval remains visible in accumulated times and summaries. This matches the injected-clock contract: bad clock values are not hidden by the timing runtime.
- fTimer does not compensate for
system_clockcounter wrap. The useful uninterrupted run length therefore depends on the compiler/runtimecount_maxandcount_ratefor the integer kind used by fTimer'ssystem_clockcall. On common modern 64-bit implementations this horizon is expected to be very large, but less-common toolchains should be checked before relying on one timing run for very long jobs. - The nominal resolution is one clock tick,
1 / count_rateseconds. Arithmetic after the clock read usesreal(wp), so fTimer preserves double-precision timing arithmetic but cannot recover resolution or monotonicity that the underlyingsystem_clockbackend does not provide. - The
start_dateandend_datestrings in local summaries are wall-date labels fromdate_and_time; they are not the source of elapsed-time accounting and should not be interpreted as clock metadata.
Current local and MPI summary types do not expose clock rate, count range, or
monotonicity metadata, and formatted reports do not inject automatic clock
metadata. That is deliberate for current main: arbitrary injected clocks do
not have a common rate/wrap model, MPI builds use MPI_Wtime() as their default
clock, and adding partial metadata to summary schemas would be a user-visible API
change. Applications that need to record target-system clock characteristics can
add them as user metadata in reports after inspecting their toolchain. A future
explicit clock-info helper can be designed if adopter demand appears.
No runtime monotonicity or wrap sanity check is currently performed beyond the
rate > 0 guard in the serial default clock. A small sample at initialization
cannot guarantee future monotonicity or wrap safety, while per-call checks would
add overhead and change the current contract that backward clock values are
surfaced rather than hidden.
Wall-clock interpretation responsibilities
fTimer records the elapsed host wall-clock interval between the caller's
start and stop calls. It does not synchronize accelerator/device queues,
wait on asynchronous offload work, or insert MPI barriers around timer regions.
- For asynchronous accelerator or device work, a timer around a launch may
measure only host enqueue/launch latency. If the intended quantity is device
completion time, the caller must perform the appropriate device synchronization
before
stop. - For MPI phase timing, rank-local start/stop windows are exactly that:
rank-local wall-clock intervals.
mpi_summary()reduces the recorded local intervals, but it does not imply that all ranks entered or exited the phase together. If the intended quantity is a synchronized global phase duration, the caller must place any required MPI synchronization outside the measured region or deliberately include it in the measured region.
Nesting Rules
- Strict stack-based nesting (no overlapping timers)
- Context-sensitive accounting: same timer name under different parents
Mismatch Handling
- Strict mode (default): error, no repair
- Warn mode: diagnostic + iterative repair
- Repair mode: silent iterative repair (Flash-X compatibility)
- Repair algorithm: single timestamp, unwind, stop target, restart unwound in reverse
Error Contract
ierrpresent: set code, no stderrierrabsent: emit a diagnostic to stderr- Validation and lifecycle errors follow a warn-and-return contract: they leave timer state unchanged unless the caller explicitly selected a repair-capable mismatch mode
Resource Exhaustion And Internal Hard Stops
The public ierr contract covers recoverable API outcomes: validation and
lifecycle failures, strict nesting mismatches, stale or unknown cached ids,
unsupported build/runtime features, stopped-run MPI/OpenMP preflight failures,
MPI descriptor inconsistency, file and CSV append I/O failures, MPI datatype or
collective failures that can be observed after entering the documented
collective path, and explicit integer-space guards such as timer-id,
OpenMP-token, and call-count exhaustion.
FTIMER_ERR_UNKNOWN remains the catchall for unexpected but recoverable public
failures that do not have a more specific status code. A new public error code
is not required for the current policy because allocation failure is not yet a
recoverable API promise, and the already-checked integer-space exhaustion paths
fit the existing unknown-but-reported category.
General allocation failure and process resource exhaustion are outside the
recoverable ierr contract on current main. Runtime growth for timer names,
contexts, summaries, report buffers, MPI descriptors, and OpenMP lane/catalog
storage uses ordinary Fortran allocatable allocation. If those allocations
cannot be satisfied, the processor/runtime may terminate the program or raise a
Fortran runtime error rather than returning a fTimer status. fTimer therefore
does not currently promise state preservation, stderr silence with ierr, or a
specific FTIMER_ERR_* value for arbitrary out-of-memory conditions. This is a
deliberate narrow policy rather than a silent fallback: adding recoverable
allocation handling would need a systematic state-preservation design across
the timing, summary, MPI, report, and OpenMP paths.
Fatal termination remains acceptable for internal invariants that should be
unreachable through valid public API use and earlier validation. Current
production hard stops cover unavailable clock backends (system_clock reporting
no rate, or direct use of the internal MPI clock helper in a non-MPI build),
internal hash/index zero-capacity or overflow states after capacity preflight,
and stop-path stack/context corruption in the serial and explicit OpenMP lane
runtimes. These indicate an impossible runtime/backend condition or corrupted
internal state; continuing could produce plausible but wrong timing data.
Public Status And Error Codes
These constants are public from ftimer_types and are the canonical status values returned through optional ierr arguments.
| Constant | Code | Meaning |
|---|---|---|
FTIMER_SUCCESS | 0 | Operation completed successfully. |
FTIMER_ERR_NOT_INIT | 1 | The timer instance or default procedural instance has not been initialized for the requested operation. |
FTIMER_ERR_NOT_IMPLEMENTED | 2 | The requested API is unavailable in this build, such as MPI summary/report APIs when FTIMER_USE_MPI=OFF. |
FTIMER_ERR_UNKNOWN | 3 | Generic failure for an unsupported or unexpected condition that does not have a more specific public code. |
FTIMER_ERR_ACTIVE | 4 | An active timer, a guard-owned activation still on the timer stack, or already-recorded timing data prevents the requested lifecycle, configuration, or report operation. |
FTIMER_ERR_MISMATCH | 5 | Strict nesting, cached-id, or scoped-guard ownership checks detected a start/stop mismatch. |
FTIMER_ERR_MPI_INCON | 6 | MPI participants have inconsistent timer descriptor trees for a strict MPI summary/report operation. |
FTIMER_ERR_IO | 7 | File, unit, or CSV append validation failed. |
FTIMER_ERR_INVALID_NAME | 8 | A timer name failed public name validation. |
Compile-Out / No-Op Instrumentation Pattern
The fTimer runtime itself is not conditionally compiled into a no-op mode. Its core semantics remain unconditional: when an application links fTimer and calls ftimer_start, ftimer_stop, scoped guards, summary APIs, callbacks, or MPI reporting APIs, the normal runtime contracts in this document apply.
Applications that need to leave timing calls in source while removing runtime overhead and fTimer dependencies in selected builds should use an application-owned facade module. The supported pattern is two facade implementations with the same application-facing interface:
- an enabled facade that delegates to fTimer and links
fTimer::ftimer - a disabled facade that does not
use ftimer, does not link fTimer, and stores no timer state
Disabled facade entry points are intentionally silent no-ops. If an ierr argument is present, the disabled facade should set it to 0, matching FTIMER_SUCCESS; if ierr is absent, it should not write to stderr. Disabled calls do not validate timer names, maintain a nesting stack, create segments or summaries, write timing artifacts, fire callbacks, or enter MPI collectives.
This disabled-facade behavior is an application integration contract, not an alternate fTimer runtime mode. To keep disabled builds dependency-free, applications should avoid exposing fTimer summary types or constants in unconditional source and should instead expose application-level report helpers, simple counters, or status values from the facade. fTimer intentionally does not provide an installed drop-in no-op module named ftimer, because that would make it too easy for a build to accidentally shadow the real library API.
Timer Name / Summary Text Policy
- Public timer creation/lookup paths right-trim trailing blanks, reject empty names, reject names that begin with a blank, and reject ASCII control characters. They do not silently truncate names and do not impose the legacy
FTIMER_NAME_LENvalue as a runtime name cap. - Timer names, runtime segment names, local summary entry names, MPI summary entry names, and metadata key/value fields use allocatable-length character storage. The exported
FTIMER_NAME_LEN = 64constant is retained so code that imports it still compiles, but it is not the current storage or validation limit. Pre-1.0 code that treated those components as fixed writable buffers, such as internal writes directly into metadata%value, must allocate or assign through a temporary string first. - Metadata entries with an unallocated or blank key are skipped by formatted reports and CSV exports. An unallocated metadata value is emitted as blank; assigned metadata values are right-trimmed before text-report escaping or CSV quoting.
- Name-based
start/stopremains the default supported timing path; the runtime uses internal mapped lookup for both resident timer names and per-segment parent-stack contexts, plus capacity-based growth, so that this ergonomic path avoids repeated resident-timer linear scans, steady-state context-list scans, and one-slot-at-a-time whole-array growth as the timer set grows - Per-timer context selection remains fully context-sensitive accounting over the current parent stack for that timer; repeated reuse of one timer name under many distinct parent stacks now uses a per-segment parent-stack index in steady state rather than rescanning the full known-context list each time
lookup()plusstart_id()/stop_id()remains an optional hot-path optimization for tight loops that repeatedly time the same known regions, especially when long labels would otherwise be validated and hashed on every name-based call- Cached IDs returned by
lookup()are opaque handles for the current timer runtime state, not segment-array indexes. They remain valid acrossreset(), but successfulinit()andfinalize()calls invalidate them. Calls made while finalized follow the normalFTIMER_ERR_NOT_INITlifecycle contract; after a later successfulinit(), passing a stale cached ID tostart_id()orstop_id()returnsFTIMER_ERR_UNKNOWNand leaves timer state unchanged - Formatted summary output does not emit unsafe raw summary-entry names or metadata header text literally
- Escaped formatted-summary forms are stable: leading blanks render as
\x20, backslashes render as\\, tab/newline/carriage return render as\t/\n/\r, delete, terminal escape bytes, C0/C1 control bytes, UTF-8 encoded C1 controls, and other ASCII control characters render as\xNN, valid non-control UTF-8 text is preserved, blank/empty raw names render as<blank>, and blank metadata values remain blank
Scoped Guard Contract
- Scoped guards are an optional safety layer for simple lexical blocks and early exits; explicit
start/stopremains the primary OOP API and the right choice for non-lexical lifetimes, complex ownership, cached-id hot paths, or timing that spans procedure boundaries - Procedural scoped timing uses
type(ftimer_guard_t)fromftimerwithcall ftimer_scope(guard, name, ierr)and records on the saved default timer instance - OOP scoped timing uses
type(ftimer_oop_guard_t)fromftimer_corewithcall ftimer_oop_scope(timer, guard, name, ierr), wheretimeris an associatedtype(ftimer_t), pointer - The OOP guard stores a non-owning pointer to that timer. The timer target must outlive the guard, remain allocated/alive, and remain initialized until the guard is inactive. Declare the guard in a nested block or procedure scope that exits before the timer target can go out of scope, or call
guard%stop(ierr=...)before leaving a shared scope or performing timer lifecycle operations - Declaring an active OOP guard in the same scoping unit as an automatic timer object and relying on finalization order at scope exit is unsupported
- Procedural
ftimer_scopeand OOPftimer_oop_scopestart the named timer through the same validation, lookup, accounting, and callback path as normal name-basedstart - If
ftimer_scopeorftimer_oop_scopefails while initializing an inactive guard, the guard remains inactive. Finalization andguard%stop(ierr)are no-ops for inactive guards. - Calling
ftimer_scopeorftimer_oop_scopeon an already-active guard returnsFTIMER_ERR_ACTIVE, or warns whenierris omitted, and leaves the existing active ownership unchanged. - A guard owns exactly one activation token from its successful start.
guard%stop(ierr)may stop only that exact activation while it is still the top of the stack. - If the guard's activation has already been stopped, repaired away, invalidated by timer lifecycle, or replaced by another activation with the same timer name,
guard%stop(ierr)returnsFTIMER_ERR_MISMATCHor the relevant lifecycle error and leaves timer state unchanged. - A guard-owned activation that is still active on the timer stack makes
reset()andfinalize()fail through the normalFTIMER_ERR_ACTIVElifecycle contract; timer lifecycle operations do not force-stop or clear active guards - fTimer does not keep a separate registry of live guard objects. If user code manually stops or repairs away a guard-owned activation first, the timer stack is no longer active; later lifecycle calls can proceed, and the still-active guard object becomes stale. A later
guard%stop(ierr)or guard finalizer reports that stale ownership as a mismatch or lifecycle error. - The guard finalizer attempts the same exact-activation stop without
ierr. On mismatch or lifecycle errors, it warns to stderr and does not repair. - Scoped guard finalization does not force-stop arbitrary matching timer names, synthesize elapsed time, invoke mismatch repair, or hide errors silently.
guard%stop(ierr)is the supported way to observe finalizer-equivalent stop errors before scope exit.- Guard assignment/copy is unsupported and does not copy or transfer active ownership. Assignment involving an active guard warns to stderr and leaves active ownership with the original guard. Use one scalar guard variable per lexical block.
- Guard arrays, saved/global guards, function-return guard constructors, cross-procedure lifetime patterns, deallocated timer targets, and block-local scoped guard finalization inside OpenMP parallel regions are unsupported.
ftimer_scope_idis deferred; use explicitlookup()plusstart_id()/stop_id()for cached-id hot paths.
Reset Behavior
- Zeros times and counts, preserves timer definitions
- Restarts the local monitoring window used for
summary%total_timeand% Total - Error if timers are active;
reset()does not auto-stop or clean up active timers
Clock Configuration Contract
- Configure custom clocks through
set_clock()and restore the build-default wall clock throughclear_clock() - Direct mutation of raw runtime clock internals is not part of the supported API contract
- Clock configuration is allowed before
init()or before a run records timing data - When a clock is configured before
init(), the nextinit()starts the local summary window in that clock's epoch - When
set_clock()orclear_clock()succeeds afterinit()but before any timing data exists, it immediately restarts the local summary window in the newly selected clock's epoch - The first subsequent
start()does not rebase the summary window; idle time between the successful clock change and the first start is included insummary%total_timeand% Totaldenominators - Empty local summaries, formatted reports, and CSV exports after a successful no-data clock change use the newly selected clock epoch and do not serialize mixed-epoch total times
- Once a run has recorded timing data,
set_clock()andclear_clock()returnFTIMER_ERR_ACTIVE(or warn to stderr whenierris omitted) and leave state unchanged reset(),init(), andfinalize()all provide clean lifecycle boundaries after which a different clock may be configured
Lifecycle Errors With Active Timers
init,reset, andfinalizerequire a fully stopped timer set- With
ierrpresent, these lifecycle calls returnFTIMER_ERR_ACTIVEand do not write to stderr - With
ierrabsent, they warn to stderr and return immediately with the timer state unchanged - They do not force-stop timers, synthesize elapsed time, zero accumulated data, restart the summary window, or perform hidden cleanup
- In
FTIMER_USE_OPENMP=ONbuilds, these lifecycle bullets apply to the existingftimer/ftimer_coreguarded APIs only in serial code and OpenMP master-thread calls; non-master calls to those guarded APIs are suppressed before validation, emit no warning, and leave any caller-providedierrunchanged. The explicitftimer_openmpobject has its own active-region rejection and queued-diagnostic contract below. - Repairing stop mismatches is a separate explicit opt-in through
mismatch_mode = FTIMER_MISMATCH_WARNorFTIMER_MISMATCH_REPAIR
Local Summary Contract
get_summary()returns a local-onlyftimer_summary_tget_summary(),print_summary(), andwrite_summary()are live snapshot APIs, not stopped-run-only final-report APIs- If timers are active at the snapshot timestamp, local summaries include those active contexts with elapsed time computed through that timestamp; they do not synthesize hidden stops, fire callbacks, or mutate runtime state
summary%has_active_timersis true when at least one returned entry was active at the snapshot timestampsummary%entriesremain in preorder so current formatted-report traversal and existing depth-oriented consumers keep working- Each entry retains
nameanddepth, exposes explicit tree linkage throughnode_idandparent_id, and exposesis_activefor that timer context at the snapshot timestamp - Local summaries expose context-cardinality diagnostics without changing timing behavior by default.
summary%total_contextsis the total number of allocated parent-stack contexts across resident timers,summary%max_contexts_per_timeris the largest context count attached to any one timer name, andsummary%context_diagnostics(:)names each resident timer with its allocated context count so callers can identify which timer name has high cardinality.summary%entries(i)%timer_context_countrepeats that per-timer count on each visible entry. - Context-cardinality diagnostics count allocated runtime contexts, not just visible summary rows. A context can be allocated but hidden from the entry table after
reset()or when it has no visible time/calls and is not active.summary%context_diagnostics(:)still includes those resident timers so the high-cardinality timer remains identifiable, whilesummary%num_entriesremains the visible row count. - These diagnostics do not add mandatory predeclaration, hard caps, default warning thresholds, or changes to context-sensitive accounting. Text reports and CSV schemas are unchanged; callers that want alerting should inspect the structured summary fields and apply their own threshold.
call_countremains the count of user-visiblestartcalls for that exact timer context. It is stored and exported asinteger(int64)so hot-loop instrumentation is not limited to the default integer range. Starting a context whose count is already at the signed-64-bit maximum fails withFTIMER_ERR_UNKNOWNor the normal omitted-ierrwarning path instead of wrapping. Repair-mode internal continuations can therefore appear as active entries withis_active = .true.andcall_count = 0; that is not a hidden user call.node_idis unique and stable only within one produced summary objectparent_idrefers to another entry'snode_id; roots useparent_id = 0- Current
maindoes not promise that local summary node ids remain stable across separate runs or across independently produced summary objects print_summary()andwrite_summary()format the same local snapshot data. When any returned entry is active, formatted reports add active-state information and reserve theActive timersmetadata key for the built-in snapshot status line. A formatted local report whoseActive timersfield isyesis an interim snapshot, not a final stopped-run report.write_summary_csv()exports the same local snapshot data in CSV format version2. It writes one header row, arecord_type=summaryrow, zero or morerecord_type=metadatarows, and onerecord_type=entryrow per summary entry. Entry rows includenode_id,parent_id,depth,name,inclusive_time,self_time,call_count,avg_time,pct_time, andis_active. The local integercall_countfield is emitted as decimal text without narrowing to default integer, and version2is the schema signal that localcall_countcan require signed 64-bit parsing.- Local and strict MPI CSV
append=.true.appends records to the target file and omits the header when the existing file is non-empty. Non-empty append targets must begin with the fTimer CSV format-version-2 header, existing data rows must be well-formed CSV logical records with the exact v2 header field count and recognizedsummary_kind/record_typecombinations, and the target must end with a newline; mismatched headers, older-format records, malformed v2 record shape or quote placement, or unterminated final records are rejected withFTIMER_ERR_IOinstead of silently mixing schemas. Sparse union, local OpenMP, and strict MPI+OpenMP CSV append use their own exact headers andsummary_kindvalidation. Append validation is a schema-shape and CSV-syntax guard for existing files, not a semantic reparse of every numeric, logical, or timing payload field already present. - CSV text fields emit trimmed raw timer names and metadata key/value text with standard CSV quoting. Unlike human-readable text reports, CSV exports do not apply the visible
\t/\n/\xNNdisplay escaping. They are not spreadsheet-formula-sanitized. - A caller that requires a final local report should stop all timers first and verify
summary%has_active_timers == .false.
MPI Guarantees
- MPI-enabled fTimer must be used after
MPI_Initand beforeMPI_Finalize mpi_summary()is collective over the communicator captured byinit- Omitting
commatinitmeansmpi_summary()usesMPI_COMM_WORLD init(comm=...)stores a non-owning communicator handle; fTimer does not duplicate or free caller-provided communicators- Callers that pass a subcommunicator must keep it valid until all fTimer MPI summaries, MPI reports,
finalize(), orinit()reinitialization that may use that communicator are complete - All ranks in that communicator must enter
mpi_summary()with fully stopped timers - Unlike local summaries, MPI summaries are final stopped-run summaries only; active timers return
FTIMER_ERR_ACTIVE - The public MPI communicator interface path is
mpi_f08withtype(MPI_Comm)handles captured atinit - Legacy integer communicator handles and
mpif.hare not supported interface paths - Integer
initoptions such asmismatch_modeandierrmust be passed by keyword; positional integerinitarguments are rejected so legacy communicator handles cannot silently bind to non-communicator options FTIMER_USE_MPI=ONconfigure requires that the activempi_f08path compile theMPI_Type_match_sizeandMPI_ERRORS_RETURNcalls used for datatype validation- MPI summary reductions select MPI datatypes with
MPI_Type_match_sizefor the actualreal(wp)andinteger(int64)storage sizes before reducing those buffers. If that validation API is present but the active MPI implementation cannot provide matching datatypes at runtime,mpi_summary()temporarily requests MPI error returns for the datatype lookup, fails withFTIMER_ERR_UNKNOWN, and leaves the MPI result empty instead of reducing through a mismatched fixed datatype. - Hash-based timer-descriptor preflight before the reduction phase
- The strict MPI preflight compares rank-local descriptor hashes against a rank-0 reference hash, then reduces a mismatch flag across the communicator. Successful summaries do not allgather every rank's hashes or exchange exact descriptor strings.
- Extra timers, missing timers, renamed timers, and hierarchy/context mismatches fail the MPI summary with
FTIMER_ERR_MPI_INCON; they do not fall back to a local summary object through the MPI API - Rank-conditional timer reductions are not supported by the strict
mpi_summary()API. Sparse/union MPI summaries are available through the separate opt-inmpi_union_summary()/ftimer_mpi_union_summary()API andftimer_mpi_union_summary_tresult model. Sparse entries report explicit participation counts, derive missing-rank counts from the communicator size, and define per-entry statistics over participating ranks only. - When that descriptor preflight fails inside one communicator, the omitted-
ierrdiagnostic reports the disagreeing communicator-local ranks when possible - MPI descriptor matching is based on the local summary tree shape and names, not on raw local
node_idvalues - The MPI descriptor preflight materializes deterministic length-prefixed path strings at summary time so names that differ only after the legacy 64-character threshold remain distinguishable. This is outside the start/stop hot path, but its memory and sort cost scales with summary entry count and encoded path length for very large timer trees.
- Mismatched communicator choices across would-be participants are unsupported; this API has no safe cross-communicator rendezvous to detect that misuse without risking the same MPI deadlock it is trying to avoid
Unsupported communicator mismatch example
Suppose ranks 0-1 initialize a timer with one communicator split and later call mpi_summary(), while ranks 2-3 reach mpi_summary() through a different communicator choice. That is unsupported misuse.
This is not like descriptor inconsistency within one communicator, where every participant can still enter the same collective and the library can fail the MPI summary cleanly after a preflight mismatch. Once ranks have already diverged onto different communicators, mpi_summary() has no safe second rendezvous it can use to discover the mistake without risking the same deadlock it is trying to avoid. The practical failure mode is a hang, not FTIMER_ERR_MPI_INCON.
The supported pattern is simple: capture one communicator consistently at init, then have that same participant set enter mpi_summary() together.
MPI lifecycle and communicator ownership
In FTIMER_USE_MPI=ON builds, the build-default clock calls MPI_Wtime()
and the MPI summary/report entry points use MPI collectives. The supported
runtime lifetime is therefore after MPI_Init and before MPI_Finalize.
Calling MPI-enabled fTimer before initialization or after finalization is
outside the current contract. There is not currently a separate
all-entry-point runtime guard for that misuse.
The communicator captured by init(comm=...) is a borrowed handle. fTimer does
not call MPI_Comm_dup, take ownership, or call MPI_Comm_free for that
communicator. Applications that split MPI_COMM_WORLD should keep each
subcommunicator alive until every fTimer operation that may consult it is done:
strict or sparse MPI summaries, MPI report writers, finalize(), or an
init() call that reinitializes the same timer object/default instance.
MPI Summary Contract
mpi_summary() returns a distinct ftimer_mpi_summary_t instead of reusing the local ftimer_summary_t shape.
ftimer_mpi_summary_tcontains communicator-wide totals (min_total_time,avg_total_time,max_total_time,min_total_time_rank,max_total_time_rank,total_time_imbalance) plus per-entry communicator-wide statistics (min_*,avg_*,max_*) for inclusive time, self time, call count, and% Total.min_call_countandmax_call_countareinteger(int64)fields;avg_call_countremainsreal(wp). MPI call-count averages avoid integer-sum overflow by reducing exactinteger(int64)extrema first, then averaging nonnegative deltas from the exact minimum count. The final average is clamped to the representablereal(wp)conversions of the exact min/max counts. Becausereal(wp)cannot represent every signed-64-bit integer exactly, a near-limit average may differ from the exact integer average by representable real rounding.- MPI summary entries also expose
min_inclusive_time_rankandmax_inclusive_time_rankas communicator-local ranks for the inclusive-time extrema; ties resolve to the lowest rank that attains the extremum. - Successful
mpi_summary()calls populate the same global MPI result on every participating rank. ftimer_mpi_summary_tentries retainname,depth,node_id, andparent_id, so MPI summaries keep the explicit-tree data model instead of collapsing to flat rows.- The MPI summary tree order is canonical across ranks. It does not depend on the local timer creation order on one chosen rank.
mpi_summary()does not return local fallback data on errors. If the caller needs local data after an MPI-disabled or MPI-error path, it must callget_summary()separately.- This datatype selection remains the portability guard for the current
mpi_f08implementation and preserves the reduction-datatype work completed before the interface migration.
Sparse/Union MPI Summary Contract
mpi_union_summary() is the explicit opt-in path reserved for rank-conditional timers. It is a separate API from strict mpi_summary(), not a mode argument, so existing strict calls cannot silently relax descriptor consistency. The procedural wrapper is ftimer_mpi_union_summary().
- The sparse result type is
ftimer_mpi_union_summary_t, withftimer_mpi_union_summary_entry_tentries. It does not reuse or extendftimer_mpi_summary_t, whose semantics remain strict identical-tree semantics. - Top-level communicator total-time fields remain all-rank fields because every rank contributes a local summary window.
- Per-entry
participating_rank_countrecords how many communicator ranks materialized that descriptor. Missing rank count is derived asnum_ranks - participating_rank_countand is not stored redundantly. - Descriptors are materialized from the local summary emitted on each rank. Lookup-only timer definitions do not count as present unless a future issue adds a first-class registration contract.
- A materialized present zero-call entry is participating and contributes zero calls plus its recorded time values to participating-rank statistics. An absent rank contributes only to the derived missing-rank count.
- Entry min/avg/max time, call-count, percent, and imbalance fields are defined over participating ranks only. Absent ranks are not zero-filled. Sparse
min_call_countandmax_call_countareinteger(int64)fields;avg_call_countremainsreal(wp)and follows the same conservative very-large-count averaging rule as strict MPI summaries. - No all-rank zero-filled or amortized entry fields are part of the initial result model. If such a view is added later, it must be explicitly named as all-rank or amortized.
- The sparse API keeps the same init-captured communicator model as strict MPI summaries. The public communicator path is
mpi_f08type(MPI_Comm). - Sparse descriptor union construction exchanges per-rank descriptor counts, exact descriptor lengths, and a packed character payload. The character exchange scales with the sum of materialized encoded descriptor lengths across ranks, not with
num_ranks * max_descriptor_count * max_descriptor_length. - Remaining sparse-summary scale limits are still explicit: each rank materializes and sorts its local encoded path descriptors at summary time, the packed exchange still gathers total communicator descriptor lengths and packed descriptor characters on every participant before deduplicating the canonical union, MPI descriptor counts/displacements and packed character counts must fit the default integer count type used by the current
mpi_f08collectives, and the final union result plus per-entry reduction work arrays scale with the canonical union entry count. - The current implementation builds the descriptor union and structured sparse result. Sparse text and CSV reports are available through explicit union report entry points. Sparse CSV uses a separate participation-aware schema rather than overloading strict MPI CSV rows.
MPI Reporting Contract
print_mpi_summary()andwrite_mpi_summary()are the first-class strict text reporting paths forftimer_mpi_summary_twrite_mpi_summary_csv()is the first-class machine-readable strict reporting path forftimer_mpi_summary_t- They are collective over the communicator captured by
init, just likempi_summary() - They build the same global MPI summary object that
mpi_summary()returns, so non-identical descriptor trees still fail withFTIMER_ERR_MPI_INCON - They emit one communicator-level report from rank 0; non-root participants take part in the collective build and return the same final status without duplicating output
- Root output failures are synchronized to all participants as
FTIMER_ERR_IO - The default MPI text report is an abbreviated view of
ftimer_mpi_summary_t, not a serialization of every structured field. It prints communicator totals plus per-entry min/avg/max inclusive time, inclusive-time extrema ranks, inclusive imbalance, average self time, average call count, andAvg %; usempi_summary()directly for min/max self time, self imbalance, min/max call count, min/max rank-local% Total, and explicitnode_id/parent_idtree links. - The strict MPI CSV export uses CSV format version
2withsummary_kind=mpi. It emits summary and metadata rows plus one entry row per MPI summary entry, including explicit tree links and all reduced fields fromftimer_mpi_summary_t. - In the MPI text report,
Avg %isavg_pct_time: the arithmetic mean of each rank's local% Totalfor that timer. It is not recomputed as100*avg_inclusive_time/avg_total_time, because rank-local denominator differences are part of the reported statistic. print_mpi_union_summary()andwrite_mpi_union_summary()are the explicit sparse/union text reporting paths forftimer_mpi_union_summary_t; the procedural wrappers areftimer_print_mpi_union_summary()andftimer_write_mpi_union_summary().write_mpi_union_summary_csv()is the explicit sparse/union CSV reporting path forftimer_mpi_union_summary_t; the procedural wrapper isftimer_write_mpi_union_summary_csv().- Sparse union reports are collective over the init communicator, build the same participation-aware object as
mpi_union_summary(), and emit one rank-0 artifact. They do not weakenprint_mpi_summary(),write_mpi_summary(), orwrite_mpi_summary_csv(). - In non-MPI builds, sparse union report APIs return
FTIMER_ERR_NOT_IMPLEMENTEDbefore formatting or writing output. File-output calls do not create or replace report files on that path. - Sparse union reports print
ParticipatingandMissingcolumns for each entry.Missingis derived assummary%num_ranks - participating_rank_count. - Sparse union per-entry min/avg/max, imbalance, average self time, average call count, and
Avg %are over participating ranks only. Missing ranks are not zero-filled, and the report labels this explicitly. - A descriptor is present for sparse reporting when it is materialized by that rank's local summary. A present zero-elapsed timer with a real start/stop contributes to
Participatingand call-count statistics; lookup-only names still are not a sparse registration contract. - Sparse union CSV uses
format_version=1andsummary_kind=mpi_unionin a dedicated header that is not append-compatible with the local/strict MPI CSV format-version-2 header. Entry rows includeparticipating_rank_count, explicitmissing_rank_count, tree links, and participating-rank statistic columns such asmin_participating_inclusive_time,avg_participating_self_time, andmax_participating_call_count. Participating call-count extrema are emitted as signed-64-bit decimal text. - Sparse union CSV does not emit all-rank zero-filled or amortized entry statistics. If such a view is added later, its columns must be explicitly named as all-rank or amortized.
MPI+OpenMP Sparse/Union Summary Contract
ftimer_openmp_t%mpi_openmp_union_summary() is the explicit opt-in path for
rank- or lane-conditional hybrid worker timing. It is a separate API from
strict mpi_openmp_summary(), not a mode argument, so strict hybrid calls cannot
silently relax descriptor or lane-participation consistency.
- The sparse hybrid result type is
ftimer_mpi_openmp_union_summary_t, withftimer_mpi_openmp_union_summary_entry_tentries andftimer_mpi_openmp_union_rank_trank rows. It does not reuse or extendftimer_mpi_openmp_summary_t, whose semantics remain strict identical rank/lane semantics. - The call is collective over the communicator captured by
ftimer_openmp_t%init, and uses the same lifecycle, valid-communicator, serial-context, worker-diagnostic, and stopped-run active-lane preflight as strict hybrid summaries. - The canonical entry set is the union of aggregate descriptors materialized by any rank or lane. Descriptor identity includes the timer/context path and execution domain, so serial-lane and OpenMP-team entries with the same timer name remain distinct.
participating_rank_countrecords how many communicator ranks materialized a descriptor on at least one eligible lane.missing_rank_countis explicit in sparse hybrid entries and is derived asnum_ranks - participating_rank_count.eligible_rank_lane_sample_count,participating_rank_lane_sample_count, andmissing_rank_lane_sample_countdescribe lane participation over known eligible rank/lane samples. Missing lane counts are derived from the observed eligible lane set for contributing timed-region epochs, not from configured lane capacity. When a precise missing-lane interpretation is not available,eligible_rank_lane_sample_countretains the sum of each participating rank's maximum/union eligible lane count for that descriptor, andmissing_rank_lane_sample_count_knownis false. In that state,missing_rank_lane_sample_countmust not be read as precise epoch-level absence.- Entry min/avg/max, call-count, percent, and imbalance fields are defined over participating rank/lane samples only. Absent ranks and absent lanes are not zero-filled. A materialized present zero-time or zero-call descriptor participates and contributes real zero values.
- Sparse hybrid result ordering is deterministic across ranks and follows the canonical descriptor union, not local creation order on one rank or lane.
MPI+OpenMP Sparse/Union Reporting Contract
print_mpi_openmp_union_summary()andwrite_mpi_openmp_union_summary()are the explicit sparse hybrid text reporting paths forftimer_mpi_openmp_union_summary_t.write_mpi_openmp_union_summary_csv()is the explicit sparse hybrid CSV reporting path. It uses a dedicatedformat_version=1,summary_kind=mpi_openmp_union,participation_policy=sparse_unionschema withsummary,metadata,rank, and aggregateentryrows.- Sparse hybrid reports are collective over the init communicator, build the
same participation-aware object as
mpi_openmp_union_summary(), and emit one communicator-root artifact. They do not weaken strict hybrid text or CSV reports. - Sparse hybrid report and CSV entry statistics are over participating
rank/lane samples only. Missing ranks and missing lanes are exposed as
participation fields, not hidden as zero-valued contributors. Sparse hybrid
text reports print
unknownfor missing rank/lane samples whenmissing_rank_lane_sample_count_knownis false; CSV keeps the explicit false flag next to the aggregate participation fields. - The compact CSV field dictionary and parser-facing schema-family notes live in
docs/csv-schema.md. That page is documentation for the current CSV families; it does not add zero-filled sparse views or new CSV schema fields.
Name Validation Error Contract
Name validation failures return FTIMER_ERR_INVALID_NAME (code 8).
Deliberate warn-and-skip contract for ierr-absent callers (issue #49, PR #43):
When a caller omits ierr and passes an invalid timer name, the runtime:
- emits a diagnostic to stderr
- returns immediately without modifying any timer state
The call is a no-op: no segment is created, no stack depth change occurs. Parent timers are not affected. Summary output will simply omit the rejected child; it does not produce a plausible-but-wrong child entry.
OpenMP carve-out: for the existing ftimer/ftimer_core guarded APIs, this
warn-and-skip contract applies in serial code and from the OpenMP master thread
only. When built with FTIMER_USE_OPENMP=ON, calls from non-master threads are
suppressed before validation reaches normalize_name or report_status — they
produce no stderr diagnostic, return 0 (for lookup), and leave any
caller-provided ierr unchanged. This is a consequence of the
master-thread-only guard model documented in "OpenMP Carve-Out And Limitations"
below. The explicit ftimer_openmp object does not use this silent no-op
contract for in-parallel object calls.
This is the deliberate policy rather than a stronger failure (e.g. error stop),
chosen for consistency with the library's error contract and because callers that
omit ierr have opted into the permissive path. Callers that require hard
enforcement should pass ierr and check it.
OpenMP Carve-Out And Limitations
- OpenMP guard behavior is enabled only when the library is built with
FTIMER_USE_OPENMP=ON - The CMake option is the source-level switch; global OpenMP compiler flags alone do not enable these guards when
FTIMER_USE_OPENMP=OFF - This is a narrow master-thread-only carve-out for bracketing a parallel region as a whole; it is not general hybrid MPI+OpenMP timing support
- The implemented model is master-thread-only timing; the current implementation does not make
fTimergenerally thread-safe - Inside OpenMP parallel regions, the guarded
ftimer_coretimer operations run only on the master thread - Non-master calls to those guarded core timer operations become no-ops instead of mutating shared timer state
- Suppressed non-master calls are skipped before normal validation, emit no stderr warning, and leave any caller-provided
ierrunchanged - The OpenMP guards do not broaden support for concurrent access to other APIs; summary/report generation and other shared access remain unsupported in threaded regions
- Thread-local timer instances, fuller concurrent timing support, and any
suppress_in_parallelcontrol remain deferred ftimer_openmpis the explicit opt-in worker-timing module. Itsftimer_openmp_t%init(config=...),finalize,reset,register_timer,lookup_timer,begin_parallel_region,end_parallel_region,start_id, andstop_identry points are available now, including optional keyword-onlycomm=capture in MPI-enabled builds. Withoutcomm=, MPI-enabled builds captureMPI_COMM_WORLD. Registered timer ids remain valid acrossreset()and are invalidated acrossfinalize()/reinit without being recycled in the same object. The MPI communicator handle is used by strict and sparse union MPI+OpenMP summary/report calls; local OpenMP summary/report behavior does not consume it.config%max_lanescounts the serial lane plus worker lanes. Serial-contextstart_id/stop_iduse lane 0. Inside an explicitly opened timed level-1 OpenMP region,start_id/stop_iduse one lane per OpenMP thread id, enforce lane-local strict stacks, and never repair or pop another lane on mismatch. Worker timing calls outside an open timed region, beyondconfig%max_lanes, or in unsupported nested parallel contexts return errors and leave unrelated lane state unchanged. OpenMP task migration is outside the validated contract.reset,finalize, reinitialization, and timed-region close scan all lanes and reject active timers. Currentftimer_openmp_ttiming uses the non-MPI wall clock even in MPI-enabled builds, so worker timing does not callMPI_Wtime()from OpenMP threads or require anMPI_Init_threadsupport level. Calls made inside an OpenMP parallel region withoutierrqueue bounded diagnostics instead of writing unordered stderr, except for valid worker timing calls. A later serial lifecycle call withoutierremits one aggregate diagnostic when fTimer itself is built withFTIMER_USE_OPENMP=ONand then proceeds. Withierr, a lifecycle call that observes queued worker diagnostics returns the first queued status without writing stderr and leaves lifecycle state unchanged; repeat the lifecycle call after that explicit drain to proceed. In non-OpenMP fTimer builds,ftimer_openmpis exposed for serial-context lifecycle/catalog/timing adoption only; using that package from a downstream OpenMP parallel region is outside the supported contract because the library was not built with OpenMP runtime introspection.ftimer_openmp_t%get_openmp_summary(summary, ierr=...),print_openmp_summary,write_openmp_summary, andwrite_openmp_summary_csvare the local OpenMP summary/report family. They return and formatftimer_openmp_summary_t, notftimer_summary_t, and do not change current local, strict MPI, or sparse MPI summary schemas.- OpenMP summaries are stopped-run-only merge points. If called inside an
OpenMP parallel region, while a timed region is open, or while any lane has
an active timer stack, they return
FTIMER_ERR_ACTIVE, leave the structured summary empty, and file-output APIs do not emit a normal artifact. ftimer_openmp_summary_t%summary_window_timeis the elapsed wall-clock time frominitorresetto the summary snapshot.timed_region_envelope_timeis the summed wall-clock duration of explicitly opened timed OpenMP regions.sum_lane_root_inclusive_timeis summed lane work over root descriptor rows only; it may exceed the wall-clock envelope when lanes run concurrently.sum_lane_self_timeis the sum of lane-local self time over all descriptor rows.- OpenMP summary entries are a canonical logical descriptor tree with
node_id/parent_idlinks. Per-entry lane min/avg/max, imbalance, summed inclusive/self time, and call-count fields are computed over participating lanes only. Missing lanes are not zero-filled. - Self time is computed on each lane before aggregation. Aggregate OpenMP self time is not computed as aggregate inclusive time minus aggregate child inclusive time.
eligible_lane_count,participating_lane_count, andmissing_lane_countdescribe lane participation for that descriptor. Eligible worker lanes come from the actual level-1 team lanes observed for contributing timed-region epochs, not fromconfig%max_lanes. Treat one fTimer timed-region epoch as one level-1 OpenMP team shape; close and reopen the fTimer timed region before timing a differently shaped team. Serial-lane descriptors use lane 0 as their eligible participant. When mixed contributing epochs make the aggregate missing-lane interpretation ambiguous,eligible_lane_countis the maximum observed eligible lane id/count for the descriptor across those epochs, andmissing_lane_count_knownis false. In that state,missing_lane_countis a conservative aggregate derived from the retained eligible count, not a precise count of lanes absent from every epoch.- The OpenMP text report is an abbreviated human-facing view of
ftimer_openmp_summary_t. The OpenMP CSV export uses a dedicatedformat_version=1,summary_kind=openmpschema withsummary,metadata, and aggregateentryrows. It is not append-compatible with the local/strict MPI version-2 CSV header or the sparse MPI union CSV header. The text report printsunknownin theMissingcolumn whenmissing_lane_count_knownis false; CSV keeps the numeric aggregate field and the explicitmissing_lane_count_known=falseflag. - Local OpenMP summaries are summary tables, not traces. They do not expose interval timelines, profiler event streams, or per-entry wall-clock interval unions.
ftimer_openmp_t%mpi_openmp_summary(summary, ierr=...),print_mpi_openmp_summary,write_mpi_openmp_summary, andwrite_mpi_openmp_summary_csvare the strict hybrid MPI+OpenMP summary/report family. They are collective over the communicator captured byftimer_openmp_t%init. In MPI+OpenMP builds,init(config=...)capturesMPI_COMM_WORLDby default; passcomm=to capture a caller-owned communicator explicitly. These entry points return/formatftimer_mpi_openmp_summary_t, notftimer_mpi_summary_torftimer_mpi_union_summary_t.- Strict hybrid summaries are stopped-run-only. Before descriptor or timing
reductions, every rank exchanges active-lane status; if any rank has an open
timed region or active lane stack, every participant returns
FTIMER_ERR_ACTIVEand the result remains empty. - Strict hybrid descriptor identity includes the logical timer/context path,
execution domain (
serial_laneversusopenmp_level1_team), and eligible lane structure. Ranks must agree on the required descriptor set and every eligible lane must participate. Missing ranks, missing lanes, different timer paths, or different eligible lane structures fail asFTIMER_ERR_MPI_INCONbefore numeric timing reductions. Missing rank/lane data is not silently filled with zero. A descriptor that spans mixed OpenMP timed-region epochs with different team sizes has unknown missing-lane precision and is rejected by this strict surface even when all lanes observed in the retained eligible set contributed at least once. ftimer_mpi_openmp_summary_tstores communicator-level rank extrema and averages for summary-window time, timed-region envelope time, summed lane root work, and summed lane self work; rank rows for each communicator-local rank; and descriptor rows with rank/lane participation counts plus participating-lane inclusive time, self time, call-count, percent, and imbalance fields.- Strict hybrid text and CSV reports are communicator-root artifacts. The CSV
export uses a dedicated
format_version=1,summary_kind=mpi_openmpschema withsummary,metadata,rank, and aggregateentryrows. It is not append-compatible with local OpenMP, local serial, strict MPI, or sparse MPI union CSV headers. - Sparse or union MPI+OpenMP hybrid participation reductions are not part of
this strict surface. Rank- or lane-conditional hybrid work must use
mpi_openmp_union_summary()and its report/CSV family rather than relying onmpi_openmp_summary()to relax strictness. - For user-facing mode selection, accepted instrumentation patterns, and
migration guidance, see
docs/openmp-timing-modes.md. The compiling examples areexamples/openmp_example.F90for compatibility timing,examples/openmp_worker_example.F90for true OpenMP worker timing, andexamples/mpi_openmp_example.F90for strict plus sparse union hybrid timing. - Sparse union hybrid MPI+OpenMP participation is implemented as a separate
ftimer_openmp_tfamily. It preserves the strict hybrid surface by using distinct structured result, text report, and CSV entry points with explicit rank/lane participation counts and participating-sample statistics.
Consequences for timing data
For the existing ftimer/ftimer_core guarded APIs, the silent worker-thread
no-op model has specific, observable consequences that users must understand to
avoid misreading summary output:
- Timer calls made exclusively on worker threads are silently dropped: no summary entry is created, no call count is incremented, and no timing data is recorded for those calls. A timer name that is started and stopped only on worker threads will not appear in the summary at all.
- Call counts reflect only master-thread invocations, not all-thread counts: when all N threads in a parallel region call
start/stopfor the same timer, only the master thread's call is recorded; the summary showscall_count = 1, notN. - Timing inside a parallel region captures only the master-thread timing window: worker-thread work duration is not separately captured or aggregated into the timer's inclusive or self time.
- Supported pattern: place
start/stopcalls outside the!$omp parallelblock to time a parallel region as a whole. The master-thread timing window then spans the full wall-clock duration of the parallel work. - Misleading pattern: placing
start/stopinside a parallel region with the expectation that each thread contributes timing data is not supported under this contract. Only the master thread's calls take effect; worker-thread contributions are silently absent. - Scoped guard limitation: block-local scoped guard finalization inside an OpenMP parallel region is unsupported. To time a parallel region, place explicit
start/stopor a scoped guard outside the!$omp parallelblock.
Callback Contract
- Configure callbacks through
set_callback()andclear_callback(), not by mutating runtime internals directly set_callback()may be called before or afterinit(), but callback configuration changes are rejected while timers are activeset_callback()accepts optional opaqueuser_data; omitting it storesc_null_ptrclear_callback()andfinalize()clear both the callback registration and its storeduser_dataon_eventis an optional lightweight intra-run hook for normal start/stop events on one timer instance- The current callback contract exposes numeric runtime identifiers only; it does not define a stable semantic mapping back to timer names or full context paths for external-profiler backends
- Repair transitions do NOT fire callbacks
- Scoped guards fire only the normal underlying start/stop events. They do not synthesize extra callback events during finalization.
- Mutating timer state from callbacks during scoped guard start/stop is unsupported.
user_dataremains opaque callback state, not a separate user-facing mutable runtime field