(IE-0006) New annotations for I6 syntax
August 26, 2023 ยท View on GitHub
- Proposal: IE-0006
- Discussion PR link: #6
- Authors: Graham Nelson, David Kinder, and Andrew Plotkin
- Status: Accepted
- Related proposals: IE-0013 and Corresponding I6 proposal.
- Implementation: Implemented but unreleased
Summary
A standard way to mark up Inform 6 syntax to provide annotations to help tools with linking or compilation, but which do not change its basic meaning.
Motivation
The motivating examples for this syntax change have now been spun off into
their own proposal, IE-0013. What it
comes down to, though, is that the original I6 language does not have a
way to express instructions for linking (analogous to C's extern keyword,
or use in many more modern languages). The I6-reader inside of inter
and therefore inform7 therefore needs to extend the I6 language with some
syntactic extensions, and it seems good to have a clean notation for that.
There are also ambitions to improve the inform6 compiler by allowing a
system of type indications, which would also need new syntax.
It seems better to have a common syntax, so this proposal puts forward a purely syntactic description of how "annotations" can be applied to directives, and then read either
(a) By inter or inform7 when it processes Include (- ... -) or compiles
kits from source, or
(b) By inform6 itself.
Components affected
- No change to the natural-language syntax.
- No change to inbuild.
- No change to inform7.
- No change to inter.
- No change to the Inter specification.
- No change to runtime kits.
- No changes to the Standard Rules and Basic Inform.
- No change to documentation.
- No change to the GUI apps.
Impact on existing projects
None.
Syntax
Any I6 directive read by the I6-syntax reader inside inter can begin with
one or more annotations introduced by the keyword +. As examples of the syntax:
+replacing(from BasicInformKit) [ SquareRoot num;
"Nobody cares about square roots, son.";
];
+private +xyzzy Constant SECRET_NUMBER = 666;
The reader removes annotations at the front of the directive, producing errors if they are syntactically invalid, and then goes on to parse as if the annotations were not there:
[ SquareRoot num;
"Nobody cares about square roots, son.";
];
Constant SECRET_NUMBER = 666;
The reader also produces errors if it sees annotations it does not understand, but which annotations it does understand, and what effect they have, are not the subject of this proposal: this proposal is about syntax alone.
The rules are as follows:
-
An annotation can be preceded by white space, but then begins with a
+sign. An identifier follows which matches the regular expression[A-Za-z_][A-Za-z0-9_]+, that is, it begins with a letter or underscore, then consists of letters, digits or underscores. -
Whitespace can follow the identifier, but cannot appear between the
+and the identifier. -
Following this, the annotation can optionally contain "details", which must appear inside a matched set of round brackets
(and). -
Inside the brackets is a list of "terms". Syntactically, the text in the brackets is divided up into "tokens", divided up by whitespace. A token is a run of non-whitespace, or some text in single quotation marks:
91,'my text'andaquariusare all tokens. The backslash escape\'can be used to write a single quotation mark inside quoted text:'like \'this\''. -
Outside of quotation marks, the characters
(and)must be used in a properly nested way; if a(appears at the top level, it begins a new token, which only ends with its matching). -
Outside of quotation marks and brackets, a comma
,acts as a term divider. -
Terms are key-value pairs, but can be written in two different ways.
-
If a term is a single lexical token, this is considered a value, and the key is the placeholder
_. For example, the annotation+eat(brioche)has one term with key_and valuebrioche. -
If a term has two or more lexical tokens, the first token is the key name, and the rest becomes the value. For example
+bake(temperature 220 Celsius)has one term with keytemperatureand value220 Celsius. -
In each term, the key must match the regular expression
[A-Za-z_][A-Za-z0-9_]+. -
Identifier and key names should always be read case-sensitively:
+xyzzyand+XYZZYare different annotations.
Note that there are no maxima. A directive can have any number of annotations. An annotation can have any number of terms.
Examples
The following are legal:
ANNOTATION TERMS IF ANY
+example
+example()
+example(2) _ = 2
+example (2) _ = 2
+example( 2 ) _ = 2
+example(2, 4) _ = 2 _ = 4
+example(2, (4, 17)) _ = 2 _ = (4, 17)
+example(two) _ = two
+example(number 2) number = 2
+example(using colour) using = colour
+example(using colour vision) using = colour vision
+example(using 'colour') using = colour
+example(using 'k\'5\'') using = k'5'
+example(using 'x, )') using = x, )
+example(using 'x', using 'y') using = x using = y
+example('using' x) using = x
The following are not legal:
+ ! No identifier
+(17) ! No identifier
+66time ! Identifier begins with digit
+bizarre<thing> ! Identifier has illegal characters <, >
+example(3, (4) ! Mismatched brackets
+example(3)(4) ! Mismatched brackets
+example(3, ) ! Empty term
+example(, 4) ! Empty term
+example(3, , 5) ! Empty term
+example(using k'5') ! Single quote not at start or end of token
+example(using'colour') ! Single quote not at start or end of token
Again, note that the rules above are purely syntactic. They say nothing about
which identifiers might be allowed in any given context, or what terms would
then be allowed for those identifiers. Syntactically, something like
+bake(temperature 110, temperature 180) or +bake(galaxy Pinwheel, food molybdenum)
are fine. Semantically, of course, if +bake is allowed at all then it is very
unlikely to permit temperature to be repeated, or galaxy to be given at all,
or food to be set to molybdenum. But those are not syntax errors.