Source locations in Clang
April 4, 2023 · View on GitHub
NOTE: This document is not meant to be an authoritative or complete. Please refer to the original source code in Clang for more details: in particular, SourceLocation.h, SourceLocation.cpp, SourceManager.h, and SourceManager.cpp. This document is meant to be read along-side the source code.
Introduction
One of the tricky things about Clang is how it handles source locations. This complexity partly comes down to the need for handling macros and line directives, which can lead to a single token having multiple different meaningful source locations.
The main types related to source locations and ranges are:
Additionally, SourceManager is a central move-only container holding all
buffers and having access to mappings between these types.
SourceLocation
Consider this code:
#define NS(_n) namespace _n { }
NS(abc)
Here abc has two associated source ranges.
One is the range inside the macro body
where abc gets inserted (the range for _n on line 1).
The other is the range where abc is written in the code directly (on line 3).
(Technically, there is also a third range,
which is the source range of abc in the pre-processed output,
but let's ignore that.)
After pre-processing, this code becomes:
namespace abc { }
Semantic analysis happens after pre-processing.
In this case, the NamespaceDecl for abc will have an associated
SourceLocation that looks like:
loc: foo.cc:3:1 (MacroID)
-> sourceManager.getSpellingLoc(loc): foo:1:26 (FileID)
-> sourceManager.getExpansionLoc(loc): foo:3:1 (FileID)
Few points worth noting here:
- The
MacroIDbit (fromSourceLocation::isMacroID()) indicates that the namespace's location was expanded from a macro. ASourceLocationis in one of 3 states:isInvalid(),isFileID()orisMacroID(). WARNING: Technically,isInvalid()source locations also returnisFileID(), so one needs to be careful if only checkingisFileID(). - The column number in the original source location
and the expansion location both point to macro invocation site,
and NOT to the actual
abcargument of the macro. - The spelling location points inside the macro body.
Caution: SourceLocations pointing outside the source
SourceLocation may sometimes represent locations
not present in the source code, such as: (non-exhaustive)
- Macro definitions on the command-line
-DICE_CREAM_FLAVOR=STRAWBERRY. - Definitions in the preamble header implicitly inserted by the compiler (
<built-in>).
See the various SourceManager::isWrittenIn* methods and
clangd::isSpelledInSource for more details.
Caution: Spelling locations may not always be well-defined
Since the preprocessor allows combining multiple tokens into one using ##,
it is possible that a token may not have a spelling location.
#define VISIT(_name) void Visit##_name##Decl(_name##Decl *) const {}
VISIT(Enum)
Here, the VisitEnumDecl method will not have a spelling location.
SLocEntry
This type holds the main pieces of information needed about files and macro expansions.
The SourceManager maintains a mapping FileID -> SLocEntry
for valid FileIDs
(tables,
accessor).
FileID
Contrary to what the name suggests, this type represents an ID
for arbitrary memory buffers.
It's probably best to mentally rename it to SLocEntryID,
since the main purpose of this type
is to be an ID for SLocEntry values,
which contain more information.
This also means that, if a single header is included multiple times,
regardless of whether it expands the same way or not, there will
be two different FileID values for it, not one.
- A valid
FileIDalways has a correspondingSLocEntry. - Since a
FileIDmay not actually represent a source file, it is possible thatsourceManager.getFileEntryForIDreturns null for a validFileID.
Aside: Connection between SourceLocation::isFileID() and FileID
SourceLocation loc = ...;
if (loc.isValid()) {
auto fileId = sourceManager.getFileID(loc);
assert(fileId.isValid());
if (loc.isFileID()) {
// The corresponding SLocEntry carries a FileInfo
assert(sourceManager.getSLocEntry(fileId).isFile());
} else {
assert(loc.isMacroID());
// The corresponding SLocEntry carries an ExpansionInfo
assert(sourceManager.getSLocEntry(fileId).isExpansion())
}
}
PresumedLoc
Takes #line directives into account, and hence generally
the right location to use for diagnostics.
When working with macro expansions, the presumed location
takes into account any applicable #line directives
at the point where the macro is expanded (i.e. at the expansion location),.
not inside the body of the macro definition (i.e. the spelling location).
This means that the following code sequence generally doesn't make sense:
sourceManager.getPresumedLoc(sourceManager.getSpellingLoc(loc)); // ❌
// Instead, use one of
// sourceManager.getPresumedLoc(loc)
// sourceManager.getSpellingLoc(loc)
// sourceManager.getExpansionLoc(loc)
// depending on the use case.
The following identity holds:
sourceManager.getPresumedLoc(loc) == sourceManager.getPresumedLoc(sourceManager.getExpansionLoc(loc))
Caution: Avoid PresumedLoc::getFileID()
In general, the following does not hold:
// Note: getPresumedLoc has an optional UseLineDirectives = true parameter
sourceManager.getFileID(loc) == sourceManager.getPresumedLoc(loc).getFileID() // ❌
For example, when the following pragma is present (such as in cstddef):
#pragma GCC system_header
The preprocessor generates a fake #line directive
(source)
on seeing this pragma.
In getPresumedLoc, when a line directive is detected,
the FileID is marked invalid (instead of using the system header's FileID)
(source).
On the other hand, sourceManager.getFileID(loc) returns the true FileID
for the system header.
SourceRange
TODO
CharSourceRange
TODO