tree-sitter-al

March 26, 2026 · View on GitHub

A tree-sitter parser for the AL programming language used in Microsoft Dynamics 365 Business Central.

PyPI crates.io npm

Parser Status

Validated against 15,358 production AL files from the Business Central codebase:

MetricValue
Success rate100% (15,358 / 15,358 files)
Tests1,404
parser.c size~11 MB
grammar.js~3,100 lines
Named keywords82 (queryable via highlights/tags)
Scanner tokens8 (stateful, depth-tracking)
Query files5 (highlights, locals, tags, indents, folds)

Installation

Rust

cargo add tree-sitter-al
use tree_sitter::Parser;

let mut parser = Parser::new();
let language = tree_sitter_al::LANGUAGE;
parser.set_language(&language.into()).expect("Error loading AL grammar");
let tree = parser.parse("codeunit 50100 MyCodeunit { }", None).unwrap();
println!("{}", tree.root_node().to_sexp());

Query constants are also available:

use tree_sitter_al::{HIGHLIGHTS_QUERY, TAGS_QUERY, LOCALS_QUERY, FOLDS_QUERY, INDENTS_QUERY};

Python (tree-sitter 0.24+)

pip install tree-sitter-al
import tree_sitter
import tree_sitter_al

lang = tree_sitter.Language(tree_sitter_al.language())
parser = tree_sitter.Parser(lang)
tree = parser.parse(b'codeunit 50100 MyCodeunit { }')
print(tree.root_node.sexp())

Node.js

npm install tree-sitter-al

Pre-built binaries

Download from GitHub Releases:

FilePlatformUse case
tree-sitter-al.wasmAllweb-tree-sitter
tree-sitter-al.soLinux x86_64ast-grep, native bindings
tree-sitter-al.dllWindows x86_64ast-grep, native bindings
tree-sitter-al.dylibmacOS ARM64ast-grep, native bindings

V2 Architecture

The grammar was rewritten from scratch in March 2026, achieving a 10x reduction in parser size while improving correctness.

Before / After

MetricV1V2
parser.c106 MB (can't push to GitHub)10.6 MB
Errors140
Success rate99.91%100%
Symbols2,249~740
States29,126~5,300
grammar.js8,500 lines~3,100 lines
Tests1,2251,404
Keywordsinvisible in queries82 named nodes
Query files3 (partial)5 (comprehensive)

Key design decisions

  • Stateful external scanner — 8 scanner tokens handle property disambiguation, depth tracking (#if/#endif nesting), named begin/end keywords at depth 0, and split-construct detection via lookahead.
  • Parse structure, don't validate — Accept any Name = Value ; as a property. Semantic validation belongs in linters/LSP servers, not the parser.
  • Generic preprocessor — One preproc_conditional rule + ~15 dedicated rules for genuinely complex split constructs (begin/end, var/begin, brace-close across #if/#else branches).
  • 82 named keyword nodes — All keywords including begin/end are named nodes, enabling proper syntax highlighting and code navigation queries.

See docs/v2-blog-post-notes.md for the full rewrite narrative.

Development

Prerequisites

  • Node.js (v16+)
  • tree-sitter CLI: npm install -g tree-sitter-cli

Building

tree-sitter generate    # Generate parser from grammar.js
tree-sitter test        # Run test suite

Validation

./validate-grammar.sh        # Quick: generation, tests, orphan/duplicate detection
./validate-grammar.sh --full # Full: includes production AL file parsing

Parsing AL files

tree-sitter parse path/to/file.al
tree-sitter parse path/to/file.al -d    # Debug output
tree-sitter parse path/to/file.al -q    # Quiet (errors only)

Key files

FilePurpose
grammar.jsMain grammar definition
src/scanner.cExternal scanner (8 tokens: property, depth tracking, named begin/end, split detection)
test/corpus/Test suite (1,404 tests)
queries/Syntax highlighting, code navigation, folding, indentation

Contributing

See CLAUDE.md for detailed development guidelines including architecture, debugging, and conventions.


Author: Torben Leth (sshadows@sshadows.dk) License: MIT (see LICENSE)