Docstrings
May 5, 2026 · View on GitHub
Status: authoritative specification for doc-comment preservation in
the Baboon DML. Implementation is delivered across PR-30.2 .. PR-30.15
of milestone M30; the per-PR breakdown lives in
docs/drafts/20260504-1213-m30-docstrings-plan.md and tasks.md
("Milestone M30 — PR breakdown"). The locked design decisions (Q1–Q4,
user-blessed 2026-05-04) are reproduced verbatim from
docs/drafts/20260504-1213-m30-docstrings-plan.md §6 and supersede the
proposed defaults captured in §2 of that plan.
This document is the single source of truth. When the compiler and this document disagree, the document wins and the compiler is wrong.
1. Overview
A doc comment in Baboon is one of two surface forms:
- a prefix doc comment opened by
/**and closed by*/, attached to the immediately following declaration (type definition, service method, or field); - a postfix line doc comment opened by
//!and terminated at the end of the source line, attached to the field whose definition shares that line.
Doc comments are a source-level annotation. They are NOT part of any
wire format (JSON, UEBA), they do NOT participate in evolution diffs or
conversions, they do NOT contribute to schema digests
(ShallowSchemaId / DeepSchemaId), and they do NOT affect codec
generation. Two .baboon files differing only in their doc comments
produce byte-identical wire output and byte-identical schema digests.
Doc comments flow through the compiler as inert metadata so that each
backend can re-emit them as idiomatic doc comments in the generated
source.
Plain non-doc comments — // … line comments and /* … */ block
comments without the second * — are ignored by the parser and do not
appear in the typed model or generated source (see §9).
2. Surface syntax
2.1 Prefix doc on type declarations
A prefix doc precedes any of the seven type-declaration kinds:
data, adt, enum, contract, service, foreign, and type
aliases. The doc binds to the immediately following declaration.
Intervening blank lines are permitted and do not break the binding;
an intervening declaration does break the binding.
/** A simple non-template DTO used as a concrete type argument below. */
data Item {
name: str
price: f64
}
/**
* Paged-results shape.
*
* `T` is the element type carried by `items`.
*/
data Page[T] {
items: lst[T]
total: u32
}
/** Convenience alias for an integer page. */
root type IntPage = Page[i32] : derived[json], derived[ueba]
The example uses data and type; the same prefix-doc form applies
uniformly to adt, enum, contract, service, and foreign
declarations.
2.2 Prefix doc on service methods
A prefix doc inside a service body binds to the immediately following
def declaration:
service Crud[K, V] {
/** Fetch a value by key. */
def get (K): V
/** Store a value and return the assigned key. */
def put (V): K
}
2.3 Prefix doc on fields
A prefix doc inside a data, ADT-arm DTO, or contract body binds to
the immediately following field declaration:
data Item {
/** Display name of the item. */
name: str
/** Unit price in store currency. */
price: f64
}
adt Envelope[T, E] {
data Ok {
/** The successful payload. */
value: T
}
data Err {
/** The error description. */
error: E
}
}
2.4 Postfix line doc on fields
A postfix //! line doc is field-only: it appears at the end of a
field-definition line and binds to that same field. The marker
terminates at the source line's end.
data User {
id: uid //! the user id
name: str //! display name
}
Postfix //! is not accepted on type declarations, on service
methods, on ADT branch headers, on enum members, or on any other
position. A //! outside a field-definition line is a parser error
(or, equivalently, fails to match because the postfix-doc rule is
anchored to the end of fieldDef).
2.5 Combining prefix and postfix on a field
A field may carry both a prefix doc and a postfix doc; both are kept
on the typed Field as two distinct cleaned strings.
data User {
/** The unique user identifier, assigned at registration time. */
id: uid //! never reused after deletion
}
The combining semantics in emission output are backend-specific (see §7).
2.6 Doc-comment delimiters and what does not parse as a doc
The grammar is exact about which delimiter sequences open a doc comment:
/**opens a prefix doc;*/closes it. Only the literal three- character sequence/**qualifies./*followed by anything other than*opens a regular block comment, which is ignored.//!opens a postfix doc; the marker terminates at end-of-line. Only the literal three-character sequence//!qualifies.//followed by anything other than!opens a regular line comment, which is ignored.- The degenerate sequences
/** */,/**/, and/**\n*/(a/**opener immediately followed by*/with only whitespace or no body between them) are recognised as empty doc comments by the parser. Per the Q4c lock (§5.4) the cleanup pass observes that the cleaned body is empty and silently drops the doc; no diagnostic is produced and the carrier node carries noDocsfor that slot.
The intent is that there is exactly one rule for each marker, and
edge cases at the boundary ("did /**/ open a doc?") resolve into the
silently-dropped empty-doc path rather than into a separate diagnostic
class.
3. Where doc comments are accepted
3.1 Prefix doc positions
A prefix /** … */ doc is grammar-anchored at the following positions:
-
before any top-level type declaration inside any namespace:
data,adt,enum,contract,service,foreign,typealias;Plain non-template aliases (
type Y = Xwhere the RHS is not a template instantiation) do not appear in the typed model as a standaloneDomainMember.User; a doc on such an alias is silently dropped, since it has no emission carrier. This is consistent with the existing typer behaviour where plain aliases are resolved transparently. Only template-instantiation aliases (type Y = X[Foo], whereXis a template) materialise a synthesized concrete type that carries the merged doc per §6. -
before any ADT branch declaration inside an
adtbody (data Ok { … },data Err { … }); -
before any service method declaration (
def name (…): …) inside aservicebody; -
before any field declaration inside a
databody, an ADT-branch DTO body, or acontractbody.
3.2 Non-positions for prefix docs
The following positions do not accept a prefix doc and a /** … */
appearing there is either a parser error or a doc that fails to bind
to anything:
- the model header (
model x.y.z) — there is no carrier node; - the version header (
version "x.y.z") — same; - a namespace opener (
ns x { … }) — namespaces are scoping constructs without their own metadata carrier; - an
importclause; - inside a
foreignbody, on individual per-language mapping lines (scala = "…",cs = "…", etc.) — only theforeigndeclaration itself accepts a prefix doc, not its language-mapping entries; - on individual enum values — only the
enumtype as a whole accepts a prefix doc; per-enum-value docs are deferred (see §9); - ADT inheritance arms (
+ Ref,- Ref,^ Ref) do NOT accept prefix docs. They are not declarations — they are structural composition operators on the parent ADT. A doc above an inheritance arm is silently dropped — the doc has no carrier in the typed model since inheritance arms are resolved structurally during ADT inheritance expansion and do not appear as standalone members. No parser diagnostic is produced.
3.3 Postfix doc position
A postfix //! doc is accepted only at the end of a field-
definition line, before the line's terminating newline. Fields are
newline-separated by the existing grammar, so a postfix //! cannot
straddle a multi-field expression. Postfix //! in any other position
is rejected by the grammar (the rule does not match at that position).
4. Stacked prefix doc comments (Q3 lock)
Two /** … */ blocks back-to-back with no intervening declaration are
a parser error:
/** First block. */
/** Second block. */
data Foo {
x: i32
}
This source produces ParserIssue.StackedDocComments(pos), where
pos is the position of the second block. The diagnostic message
recommends merging the two blocks into one coherent prefix doc.
The legal alternatives are either to merge the two blocks into a
single multi-paragraph /** … */ (paragraphs separated by a blank
line), or — when the two blocks should bind to two different
declarations — to interpose the second declaration between them.
Rationale. Stacked prefix docs raise an ambiguity ("which block is the doc?") that cannot be resolved by a local rule without privileging one block over the other. Forcing a single coherent block per declaration removes the ambiguity at parse time and matches the practice of every emission target listed in §7. The Q3 lock chose the stricter rule precisely so that the question "which doc binds?" has a single mechanical answer: there is at most one prefix doc per declaration in the raw AST, by construction.
The corresponding raw-AST shape carries this invariant. Per the plan (§6 Q3 lock):
case class RawDocs(prefix: Option[RawDocComment], suffix: Option[RawDocComment])
prefix is Option, not List — there is no representation in the
typed model for a stacked prefix.
(Postfix //! is single-line by construction, so the analogous
question — "may two postfix docs stack on one field?" — does not
arise. A field has at most one terminating newline and therefore at
most one trailing //!.)
5. Doc-body cleanup rule (Q4 locks)
5.1 Ownership
Doc-body cleanup is performed once, in the typer, by a single
canonical function DocFormat.clean(raw: String): String. The typed-
side DocComment carries both the original raw text and the cleaned
text:
case class Docs(prefix: Option[DocComment], suffix: Option[DocComment])
case class DocComment(raw: String, cleaned: String)
Backends consume cleaned. When a backend's emission target requires
escaping (XML for C#, HTML for Java's Javadoc, """ for Python or
GraphQL block strings), the backend applies that escaping at emission
time on top of cleaned. The escaping is deliberately not baked
into cleaned because the cleaned form is intended to be the canonical
plain-text representation shared across all backends; per-language
escaping is the responsibility of each backend's renderer.
5.2 Cleanup algorithm for prefix /** … */
Given the raw byte sequence between the opening /** and the closing
*/ (exclusive of both delimiters), the cleanup function performs the
following steps in order:
- Strip the delimiters. The opening
/**and the closing*/are removed. The body is what remains. - Split on line boundaries. Use the source-file line separator
(
\n, with\r\ncollapsed to\nfirst). - Strip the common leading prefix.
a. Identify content lines. A content line is an interior line
that contains at least one character beyond what the conventional
\s*\*\s*Javadoc separator supplies. Concretely, a line that matches\s*\*?\s*$end-to-end (whitespace-only, or whitespace followed by an optional*followed by whitespace, with no further content) is a separator line and is excluded from prefix computation. Purely blank lines are also excluded. b. Compute the common prefix. Compute the longest string of the form\s*\*?\s*(any leading whitespace, an optional single*, and optionally further whitespace) that is a prefix of every content line. If there are no content lines the body is treated as empty and falls through to step 5's collapse rule. c. Strip the prefix from every line. Apply the computed prefix as a prefix-strip to every interior line, including separator lines. A separator line that is shorter than the computed prefix is reduced to the empty string (no index-past-end error). This removes the conventional*continuation marker on each line of a multi-line Javadoc-style block. - Right-trim each line. Trailing whitespace at the end of each line is removed.
- Collapse leading and trailing blank lines. Any number of fully- blank lines at the very start or very end of the body is removed.
- Preserve internal blank lines. Blank lines between non-blank lines are kept verbatim, treated as paragraph separators.
The result is a single String with \n line separators, no leading
or trailing blank lines, no trailing whitespace per line, and no
Javadoc continuation markers.
Bodies without * continuation markers. The same algorithm
handles the common TS / Kotlin / Rust style where interior lines carry
no * prefix. The \*? term in the prefix pattern contributes
nothing, and the common prefix degenerates to common leading
whitespace. For example:
/**
text without star
more text
*/
Step 3b finds the common prefix (two spaces) over the two content
lines; step 3c strips it from each. Output:
text without star
more text
5.3 Cleanup algorithm for postfix //! …
Given the raw byte sequence after the //! marker and before the
end-of-line:
- Strip the
//!marker. - Strip a single optional leading space (so that the conventional
//! the user idform yieldsthe user id, notthe user id). - Right-trim trailing whitespace.
Postfix docs are single-line by construction; there is no paragraph structure to preserve.
5.4 Empty / whitespace-only docs
If clean(raw) returns an empty string or a string that is purely
whitespace, the doc is silently dropped: the carrier node carries
no Docs entry for that slot and no diagnostic is produced. This rule
applies uniformly to prefix and postfix docs, and to the degenerate
empty-doc forms /**/, /** */, and /**\n*/ mentioned in §2.6.
5.5 Worked example
Input:
/**
* First paragraph.
* Continued.
*
* Second paragraph.
*/
Step 1 strips the delimiters. Step 2 splits on line boundaries, giving four interior lines:
* First paragraph.
* Continued.
*
* Second paragraph.
Step 3a classifies them: the third line ( *) matches \s*\*?\s*$
end-to-end — it is a separator line and is excluded from prefix
computation. Steps 3b–3c compute the common prefix * (one space,
asterisk, two spaces) over the three content lines, then strip it from
every interior line including the separator. The separator line *
is two characters long and is entirely consumed by the strip, leaving
the empty string. Steps 4–6 right-trim each line and collapse the
leading and trailing blank lines. Output:
First paragraph.
Continued.
Second paragraph.
Interior whitespace beyond the common leading prefix is preserved verbatim — step 3 strips only the common leading prefix.
Edge case — separator shorter than the common prefix. If a
separator line is shorter than the computed prefix (e.g. the line is
* while the prefix is * ), the strip consumes only as many
characters as are present; the line becomes the empty string. No
index-past-end error occurs. This is the same rule stated in step 3c
above.
6. Template monomorphisation interaction (Q1 lock)
This section specifies how doc comments propagate through the M29
template-monomorphisation pipeline. See docs/spec/generics.md §3 for
the monomorphisation rule itself.
6.1 Statement of the rule
Given a template X[T1, …, Tn] and an alias type Y = X[A1, …, An],
the synthesized concrete type emitted under the identity Y carries:
- Type-level doc = alias-doc, a blank line, then template-doc when
both are present; alias-doc alone when only the alias has a doc;
template-doc alone when only the template-type has a doc; no doc
when neither has one. The separator is exactly one blank line (a
single
\n\njoin in the cleaned string representation). - Field-level docs = the docs declared on the corresponding fields in the template body, propagated verbatim. Aliases never carry field-level docs because the alias surface form has no field positions.
6.2 Worked examples
Template with field doc only (alias has no doc, template type has no doc):
data Box[T] {
/** the carried element */
value: T
}
type IntBox = Box[i32]
Synthesized:
data IntBox {
/** the carried element */
value: i32
}
Both alias-doc and template-doc present:
/** Paged-results shape, generic over the element type. */
data Page[T] {
/** the elements on this page */
items: lst[T]
/** total elements across all pages */
total: u32
}
/** A page of integers. */
type IntPage = Page[i32]
The synthesized IntPage carries the merged type-level doc:
A page of integers.
Paged-results shape, generic over the element type.
and the two field docs propagated verbatim from the template body.
Alias-doc only. data Bag[T] { items: lst[T] } plus
/** Bag of integers. */ type IntBag = Bag[i32] synthesizes IntBag
with the alias's doc as its type-level doc and no field docs (the
template body has none).
Neither. Synthesized type carries no type-level doc; field docs are still propagated from the template body if any exist there.
6.3 Implementation note
The doc merge is performed at template-substitution time on the
synthesized RawTLDef's RawNodeMeta, before the substituted body
re-enters the typer's per-declaration conversion path. This places the
merge inside TemplateInstantiator (introduced in PR-29.5) and means
that downstream code — BaboonTranslator.convert*, validators, and
all backends — sees the merged doc as if the user had hand-written
the materialised concrete type with that doc directly.
7. Per-backend emission idioms (locked)
This is the locked taxonomy of per-backend emission shapes for doc comments. Future backends MUST pick from this taxonomy unless the spec is amended. Each subsection states the emission shape and shows a short example for the type-level case; field-level and method-level emission follows the same shape with the symbol changed.
7.1 Scala
Javadoc-style /** … */ with * line prefix, immediately before the
symbol.
/**
* A page of integers.
*/
final case class IntPage(items: List[Int], total: Int)
7.2 Java
Javadoc /** … */ immediately before the symbol. Body text is HTML-
escaped: < → <, > → >, & → & to preserve the
text rendered through javadoc. Emission shape otherwise identical to
§7.1.
7.3 Kotlin (incl. KMP)
KDoc /** … */ immediately before the symbol; both the JVM and
Kotlin/KMP translators emit the same form. Shape identical to §7.1.
7.4 TypeScript
Javadoc /** … */ immediately before the symbol. Shape identical to
§7.1.
7.5 C#
XML doc comments via ///. The first paragraph of the cleaned doc is
emitted inside <summary>…</summary>; subsequent paragraphs are
emitted inside <remarks>…</remarks>. Body text is XML-escaped:
< → <, > → >, & → &, " → ".
/// <summary>A page of integers.</summary>
/// <remarks>
/// Optional second paragraph appears here.
/// </remarks>
public sealed class IntPage { /* … */ }
7.6 Python
Class-level and method-level doc comments are emitted as PEP 257 docstrings — a triple-quoted string literal as the first statement of the class or method body.
Field docs are NOT emitted as per-field statements. Per the Q2
lock, field docs are folded into the class-level docstring as a
Sphinx/Google-style Attributes: section keyed by field name. ADT-arm
DTOs follow the same shape: each arm class folds its own arm-field
docs into its own arm-class docstring.
class IntPage:
"""Paged integers.
Attributes:
items: the elements on this page
total: total elements across all pages
"""
Continuation lines for multi-paragraph field docs indent by 8 spaces
(4 for the Attributes: block + 4 for continuation), per Sphinx and
Google docstring conventions. Triple-quote sequences """ inside the
doc body are escaped to \"\"\" to keep the docstring well-formed.
7.7 Rust
Rust uses /// outer-line doc comments before the item. Rust
distinguishes outer-line /// from module-inner //!; the //!
module-inner form is not used in generated output because it would
mis-attribute the doc to the enclosing module rather than to the
following item.
A baboon postfix //! on a field is therefore mapped to a /// line
before the field in Rust output:
/// A page of integers.
pub struct IntPage {
/// the elements on this page
pub items: Vec<i32>,
/// never reused after deletion
pub total: u32,
}
7.8 Dart
Dart /// doc comments immediately before the symbol. One /// line
per cleaned doc-body line.
7.9 Swift
Swift /// doc comments (apple-doc default) immediately before the
symbol. Same shape as §7.8.
7.10 GraphQL SDL
GraphQL block-string descriptions ("""…""") immediately before the
type, field, or argument. Embedded """ sequences inside the doc body
are escaped to \"\"\" per the GraphQL specification.
"""A page of integers."""
type IntPage {
"""the elements on this page"""
items: [Int!]!
"""total elements across all pages"""
total: Int!
}
7.11 OpenAPI 3.1
The description JSON Schema key on the component, field, or
parameter. The cleaned doc body is emitted as a plain JSON string
(JSON-escaped); no Markdown normalisation is applied by the compiler
(see §9). Type-level docs land on the component schema's
description; field-level docs land on each property's
description.
8. Diagnostics introduced
M30 introduces a single new parser diagnostic. No new TyperIssue,
VerificationIssue, EvolutionIssue, IOIssue, TranslationIssue,
or RuntimeCodecIssue cases are introduced.
8.1 ParserIssue.StackedDocComments(pos) (PR-30.2)
Two /** … */ prefix doc blocks back-to-back with no intervening
declaration. The diagnostic cites the position of the second
block. The recommended message form: "Stacked prefix doc comments at
pos — merge into a single /** … */ block per declaration."
The case-class shape mirrors the convention already established in
baboon-compiler/src/main/scala/io/septimalmind/baboon/parser/model/issues/ParserIssue.scala:
case class StackedDocComments(pos: InputPointer) extends ParserIssue
A corresponding IssuePrinter[StackedDocComments] instance lives
alongside the existing parserFailedPrinter and
includeNotFoundPrinter.
8.2 Audit obligation: parser-issue exhaustive matches
ParserIssue is a sealed trait. Three files in the compiler contain
exhaustive match expressions over ParserIssue cases (verified by
inspection of the current tree), with a total of four match sites:
baboon-compiler/src/main/scala/io/septimalmind/baboon/lsp/features/DiagnosticsProvider.scala(theconvertToDiagnosticbody,BaboonIssue.Parser(pi)arm);baboon-compiler/src/main/scala/io/septimalmind/baboon/lsp/state/WorkspaceState.scala(theextractInputPointerbody, L81–87);baboon-compiler/src/main/scala/io/septimalmind/baboon/lsp/state/WorkspaceState.scala(theformatIssuebody, L207–211);baboon-compiler/.js/src/main/scala/io/septimalmind/baboon/BaboonJS.scala(theextractIssuePointerbody, JS-side error-formatting path).
PR-30.2 must add the StackedDocComments arm to every exhaustive
ParserIssue match in each of the three files. Bundle all arms in a
single edit per file even when a file carries multiple matches (e.g.
WorkspaceState.scala contains two such matches). This mirrors the
M29 3-site exhaustive-match discipline documented in CLAUDE.md
("PR-29.7 / PR-29.4 / PR-29.5 canonical examples").
mdl :build :test runs sbt +compile, which cross-builds JVM and
Scala.js with -Wconf settings that promote inexhaustive-match
warnings to errors; a missed JS-side update is caught by that path.
The single new case (StackedDocComments) maps to four arms total
across three files.
9. Out of scope
The following items are deliberately NOT delivered in M30 and will require separate milestones / decisions:
- Tree-sitter editor-grammar updates for doc-comment recognition.
The grammar changes live in a 3-level submodule chain
(
editors/baboon-zed→grammars/baboon) that requires user- authorised pointer bumps across separate git repositories. Deferred per the[PR-29.8-D01]precedent. - Preserving non-doc comments (
// …line comments and/* … */block comments without the second*). Non-doc comments are ignored by the parser and do not appear in the typed model. - Per-enum-value docs. Only the
enumtype as a whole accepts a prefix doc in M30; per-value docs are deferred. - Sub-statement / mid-expression doc placement. Docs are
anchored at a small set of grammar positions (§3); doc placement
inside a type expression (
lst[/** elem */ T]) or between an identifier and its type (x: /** wat */ i32) is not in scope. - Markdown / RST / structured-text normalisation of doc body content beyond the cleanup rule of §5. The cleaned doc body is plain text; backends emit it as-is (with per-language escaping per §7).
- Wire-format changes. Docs do not enter the JSON wire form, the
UEBA wire form, or the schema-digest computation
(
ShallowSchemaId/DeepSchemaId). Two.baboonfiles that differ only in their doc comments produce byte-identical wire output and byte-identical digests; this is verified as a cross- cutting concern in PR-30.3 (see plan §5.4).
10. Implementation pointers
PR-30.2 (parser), PR-30.3 (typer + domain model), PR-30.4 .. PR-30.13
(per-backend emitters, one PR per language), PR-30.14 (LSP hover),
and PR-30.15 (cross-language smoke fixture m30-ok + close-out)
implement this specification. The per-PR breakdown lives in
docs/drafts/20260504-1213-m30-docstrings-plan.md §4.
Critical files for the implementer (paths relative to repository root):
- parser:
baboon-compiler/src/main/scala/io/septimalmind/baboon/parser/{defns/{DefMeta,DefDto}.scala, model/RawNodeMeta.scala, model/issues/ParserIssue.scala} - typer / domain:
baboon-compiler/src/main/scala/io/septimalmind/baboon/typer/{BaboonTranslator,TemplateInstantiator}.scala,typer/model/{Typedef,DomainMember}.scala - per-backend translators:
baboon-compiler/src/main/scala/io/septimalmind/baboon/translator/<lang>/{*DefnTranslator,*TreeTools}.scala - LSP / JS exhaustive matches (§8.2 audit):
baboon-compiler/src/main/scala/io/septimalmind/baboon/lsp/features/DiagnosticsProvider.scala,lsp/state/WorkspaceState.scala,baboon-compiler/.js/src/main/scala/io/septimalmind/baboon/BaboonJS.scala - LSP hover (PR-30.14):
baboon-compiler/src/main/scala/io/septimalmind/baboon/lsp/features/HoverProvider.scala
11. PR-origin diagnostic table
| Diagnostic | Section ref | Origin PR |
|---|---|---|
ParserIssue.StackedDocComments | §4, §8.1 | PR-30.2 |