ARCHITECTURE.md
May 1, 2026 ยท View on GitHub
Choices
There are a number of choices when writing a Python type checker. We are taking inspiration from Pyre1, Pyright and MyPy. Some notable choices:
- We infer types in most locations, apart from parameters to functions. We do
infer types of variables and return types. As an example,
def foo(x): return Truewould result in something equivalent to had you writtendef foo(x: Any) -> bool: .... - We attempt to infer the type of
[]to however it is used first, then fix it after. For examplexs = []; xs.append(1); xs.append("")will infer thatxs: List[int]and then error on the final statement. - We use flow types which refine static types, e.g.
x: int = 4will both know thatxhas typeint, but also that the immediately next usage ofxwill be aware the type isLiteral[4]. - We aim for large-scale incrementality (at the module level) and optimized checking with parallelism, aiming to use the advantages of Rust to keep the code a bit simpler.
- We expect large strongly connected components of modules, and do not attempt to take advantage of a DAG-shape in the source code.
Code layout
Pyrefly is split into a number of crates (mostly under crates/):
pyrefly_utilare general purpose utilities, which have nothing to do with Python or type checking. Examples include IO wrappers, locking, command line helpers etc.pyrefly_deriveare proc-macros for deriving traits such asTypeEqandVisit.pyrefly_pythonare Python utilities with no type-checking aspects, such as modelling modules orsys.info.pyrefly_graphprovides utilities for indexing values and caching computations that may depend on each other.pyrefly_bundledare the third-party typeshed stubs.pyrefly_configdefines the Pyrefly configuration, along with support for reading Mypy/Pyright configuration.pyrefly_typesdefines the Pyrefly type along with operations on it.pyrefly_wasmdefines the sandbox code that compiles to WASM.pyreflyitself is the type checker and everything else.
Design
There are many nuances of design that change on a regular basis. But the basic substrate on which the checker is built involves three steps:
- Figure out what each module exports. That requires solving all
import *statements transitively. - For each module in isolation, convert it to bindings, dealing with all statements and scope information (both static and flow).
- Solve those bindings, which may require the solutions of bindings in other modules.
If we encounter unknowable information (e.g. recursion) we use Type::Var to
insert placeholders which are filled in later.
For each module, we solve the steps sequentially and completely. In particular, we do not try and solve a specific identifier first (like Roslyn or TypeScript), and do not use fine-grained incrementality (like Rust Analyzer using Salsa). Instead, we aim for raw performance and a simpler module-centric design - there's no need to solve a single binding in isolation if solving all bindings in a module is fast enough.
Example of bindings
Given the program:
1: x: int = 4
2: print(x)
We might produce the bindings:
define int@0=from builtins import intdefine x@1=4: int@0use x@2=x@1anon @2=print(x@2)export x=x@2
Of note:
- The keys are things like
define(the definition of something),use(a usage of a thing) andanon(a statement we need to type check, but don't care about the result of). - In many cases the value of a key refers to other keys.
- Some keys are imported from other modules, via
exportkeys andimportvalues. - In order to disambiguate identifiers we use the textual position at which they
occur (in the example we've used
@line, but in reality it's the byte offset in the file).
Example of Var
Given the program:
1: x = 1
2: while test():
3: x = x
4: print(x)
We end up with the bindings:
x@1=1x@3=phi(x@1, x@3)x@4=phi(x@1, x@3)
The expression phi is the join point of the two values, e.g. phi(int, str)
would be int | str. We skip the distinction between define and use, since
it is not necessary for this example.
When solving x@3 we encounter recursion. Operationally:
- We start solving
x@3. - That requires us to solve
x@1. - We solve
x@1to beLiteral[1] - We start solving
x@3. But we are currently solvingx@3, so we invent a freshVar(let's call it?1) and return that. - We conclude that
x@3must beLiteral[1] | ?1. - Since
?1was introduced byx@3we record that?1 = Literal[1] | ?1. We can take the upper reachable bound of that and conclude that?1 = Literal[1]. - We simplify
x@3to justLiteral[1].