Performance and Efficiency
June 7, 2026 · View on GitHub
Performance and efficiency have multiple aspects: ease of learning and usage (developer productivity), compilation speed, startup time, runtime performance, and memory usage.
This language transpiles to C, which has a highly optimized toolchain, and is available for embedded systems, desktops, and servers. Startup time is significantly faster than that of virtual machine-based languages like Java or C#, as there is no VM or runtime to initialize. Runtime performance: this language aims to be in the same category as high-performance languages such as C, Rust, Go, Java, and Swift. To ensure low memory usage and to avoid GC pauses, it does not use tracing garbage collection.
Memory safety results in runtime overhead from reference counting and array bounds checking. However, for performance-critical sections, this overhead can be mitigated: The language supports ownership semantics for references, and range-restricted index variables, so that the compiler can eliminate these checks where applicable. The complexity of these features is however not needed in the majority of the cases, which results in simple code and high productivity.
Benchmarks
| Benchmark | Bau | C | Go | Java | Nim | PyPy | Rust | Swift | Vlang | Zig |
|---|---|---|---|---|---|---|---|---|---|---|
| Binary Trees | 3.5 | 3.3 | 6.8 | 2.0 | 4.0 | 5.5 | 4.1 | 10.8 | 4.5 | 4.5 |
| Fannkuch | 1.5 | 1.5 | 1.5 | 1.8 | 1.5 | 3.7 | 1.4 | 1.7 | 1.5 | 1.6 |
| SpeedTest | 1.3 | 1.2 | 1.9 | 2.7 | 1.6 | 10.2 | 1.2 | 1.4 | 1.3 | 1.2 |
| Pi Digits | 1.3 | 0.4 | 0.6 | 1.9 | 24.0 | 1.4 | 1.0 | 5.0 | 3.2 | 3.1 |
| Mandelbrot | 1.8 | 1.8 | 1.8 | 2.1 | 1.9 | 9.8 | 2.0 | 9.8 | 1.8 | 9.4 |
| NBody | 1.5 | 1.4 | 1.5 | 1.9 | 1.6 | 10.4 | 1.7 | 1.7 | 1.6 | 1.5 |
(Runtime in seconds. Lower is better. Measured on an Apple MacBook Pro M4.)
Disclaimer
These benchmarks are not designed to show a language is "better" than another language. Performance depends on many factors such as the algorithm used, the developer, how much time is spend on optimizations, etc. Also, measurements vary with hardware, compiler, or operating system.
Why then publish these benchmarks? Performance is an important aspect when selecting a programming language. It is true that benchmarks are often used to mislead: cherry-picking, comparing old version of competitors, not specifying the details (compiler flags etc.). However, not doing any benchmarks is not a solution either. Computer science papers, for example, are often required to include benchmarks. It is expected that performance is measured. So it is a double-edged sword: many like to see benchmarks results, but many will criticize the result - no matter how the benchmarks are done or what the results show. In the field of database engines, the "DeWitt Clause" is used, which prevents people (competitors) from publishing benchmark results. These clauses were included in licenses of databases because DeWitt, a researcher, conducted benchmark studies showed performance issues in popular databases. But for programming languages, very few (commercial) languages use such licenses.
Only a small number of benchmarks are implemented so far, most of them are based on the micro-benchmarks from The Computer Language Benchmarks Game.
For all languages, a very simple single-threaded implementation is used
(without inline assembly etc.).
Memory usage is not currently measured.
The tests are run 3 times, and the best time is used.
Benchmark results in seconds (lower is better).
For Java, memory is limited to 100 MB by using -Xmx100m.
What this page tries to show is that, for these limited benchmarks, Bau has a similar performance then other popular programming languages, specially C. Which makes sense, because it is transpiled to C. It is sometimes slower, and sometimes faster, than Java, Go, and Rust.
Languages
- C: The default C compiler of the environment is used, which is Apple clang version 17.0.0 currently.
- Go: Version 1.25.2 darwin/arm64 is use.
- Java: OpenJDK version 25 2025-09-16 is used.
- Nim: Version 2.2.4 is used.
- Python: PyPy is used here, even thought it is not supported by all frameworks; CPython is around 10 times slower. The version used is PyPy 7.3.17 with GCC Apple LLVM 16.0.0.
- Rust: RustC version 1.90.0 is used.
- Swift: Apple Swift version 6.1.2 is used.
- Vlang: Version 0.4.12 is used. Notice that V used the Boehm GC library.
- Zig: Version 0.15.1. Note that Zig is not a memory-safe language, similar to C.
Benchmarks
Binary Trees
This test generates binary trees and counts the nodes. The Java version is very fast if given enough memory, because it doesn't collect garbage; when limiting memory to 100 MB, it does collect garbage, but in a different thread. The Go garbage collector is limited to one thread using "runtime.GOMAXPROCS(1)", and so it is slower than Java, which can not be limited to one thread (which is arguably not fair). For Bau, the ownership variant is used; the reference counted variant is a bit slower. Bau includes a faster malloc implementation, which would brings performance close to Java. The command line argument 20 is used instead of 21 as in the original test, to speed up running the test; however the relative performance is unaffected.
Fannkuch
This test simulates flipping pancakes. This test uses many array accesses. For Bau, no attempt was made to eliminate bound checks. It unclear why the C version is a little bit slower then the C version created from Bau. The command line argument 11 is used instead of 12 as in the original test, to speed up running the test; however the relative performance is unaffected.
SpeedTest
This test is about the Münchausen numbers problem. This is a very fast loop with a lot of array access. (Standard) Python is particularly slow here because it is interpreted and doesn't use a JIT compiler. The same settings are used as in the original benchmark.
Pi Digits
This uses a big integer library that computes 10'000 digits of Pi. The same settings are used as in the original benchmark.
Performance depends mostly on the big integer library. The big integer library of Go, for example, is highly optimized, and using platform-specific assembly. The Rust library is highly optimized as well, but the C "gmp" library is the fastest. The Swift library "attaswift/BigInt" is used. For Nim, nim-lang/bigints is used, where multiplication and division are not optimized for performance. The Bau bigint library is around 400 lines of code, modelled after the Java library, without platform-specific code. Bau, as well as other languages, could easily use the "gmp" library as well.
Mandelbrot
This test computes the Mandelbrot set. Only 8'000 by 8'000 pixels are calculated, versus 16'000 by 16'000 as in the original test, to speed up running the test; however the relative performance is unaffected. It is mostly testing floating point performance.
NBody
This test models the orbit of planets. Unlike in the original test, the number of planets is dynamic. With a hardcoded array length, Rust is faster than C due to more aggressive loop unrolling (other languages are not affected).
Building and Running the Tests
Download and build the latest version:
git clone git@github.com:thomasmueller/bau-lang.git
cd bau-lang
Using Maven:
mvn -DskipTests clean install
Using Make:
make jar
Compiling and Running the C, Java, and Bau versions:
mkdir -p target
cd target
echo "== Bau ============"
cp ../src/test/resources/org/bau/benchmarks/bau/* .
java -jar bau.jar -O3 -useTmMalloc false *.bau
for i in {1..3}; do time ./binaryTrees 20; done
for i in {1..3}; do time ./binaryTreesRefCount 20; done
for i in {1..3}; do time ./fannkuch 11; done
for i in {1..3}; do time ./munchausen; done
for i in {1..3}; do time ./piDigits > out.txt; done
for i in {1..3}; do time ./mandelbrot 8000 > out.tiff; done
for i in {1..3}; do time ./nbody; done
for i in {1..3}; do time ./virtualDispatch; done
for i in {1..3}; do time ./virtualDispatchOwned; done
java -jar bau.jar -useTmMalloc true -O3 *.bau
for i in {1..3}; do time ./binaryTrees 20; done
for i in {1..3}; do time ./binaryTreesRefCount 20; done
for i in {1..3}; do time ./linkedList; done
echo "== C ============"
cp ../src/test/resources/org/bau/benchmarks/c/* .
gcc -O3 binaryTrees.c -o binaryTrees
gcc -O3 fannkuch.c -o fannkuch
gcc -O3 munchausen.c -o munchausen
gcc -O3 piDigits.c -o piDigits -I/opt/homebrew/include -L/opt/homebrew/lib -lgmp
gcc -O3 mandelbrot.c -o mandelbrot
gcc -O3 nbody.c -o nbody
for i in {1..3}; do time ./binaryTrees 20; done
for i in {1..3}; do time ./fannkuch 11; done
for i in {1..3}; do time ./munchausen; done
for i in {1..3}; do time ./piDigits 10000 > out.txt; done
for i in {1..3}; do time ./mandelbrot 8000 > out.tiff; done
for i in {1..3}; do time ./nbody; done
echo "== Go ============"
cp ../src/test/resources/org/bau/benchmarks/go/* .
go build -ldflags="-s -w" binaryTrees.go
go build -ldflags="-s -w" fannkuch.go
go build -ldflags="-s -w" munchausen.go
go build -ldflags="-s -w" piDigits.go
go build -ldflags="-s -w" mandelbrot.go
go build -ldflags="-s -w" nbody.go
go build -ldflags="-s -w" linkedList.go
go build -ldflags="-s -w" virtualDispatch.go
for i in {1..3}; do time GOMAXPROCS=1 ./binaryTrees 20; done
for i in {1..3}; do time GOMAXPROCS=1 ./fannkuch 11; done
for i in {1..3}; do time GOMAXPROCS=1 ./munchausen; done
for i in {1..3}; do time GOMAXPROCS=1 ./piDigits > out.txt; done
for i in {1..3}; do time GOMAXPROCS=1 ./mandelbrot 8000 > out.tiff; done
for i in {1..3}; do time GOMAXPROCS=1 ./nbody; done
for i in {1..3}; do time GOMAXPROCS=1 ./linkedList; done
for i in {1..3}; do time GOMAXPROCS=1 ./virtualDispatch; done
echo "== Java ============"
javac ../src/test/java/org/bau/benchmarks/*.java -d .
time java -Xmx100m org.bau.benchmarks.Loop org.bau.benchmarks.BinaryTrees 20
time java -Xmx100m org.bau.benchmarks.Loop org.bau.benchmarks.Fannkuch 11
time java -Xmx100m org.bau.benchmarks.Loop org.bau.benchmarks.Munchausen
time java -Xmx100m org.bau.benchmarks.Loop org.bau.benchmarks.PiDigits 10000 | grep Run
time java -Xmx100m org.bau.benchmarks.Loop org.bau.benchmarks.Mandelbrot 8000 | grep -a Run
time java -Xmx100m org.bau.benchmarks.Loop org.bau.benchmarks.NBody | grep -a Run
for i in {1..3}; do time java -Xmx100m org.bau.benchmarks.BinaryTrees 20; done
for i in {1..3}; do time java -Xmx100m org.bau.benchmarks.Fannkuch 11; done
for i in {1..3}; do time java -Xmx100m org.bau.benchmarks.Munchausen; done
for i in {1..3}; do time java -Xmx100m org.bau.benchmarks.PiDigits 10000 > out.txt; done
for i in {1..3}; do time java -Xmx100m org.bau.benchmarks.Mandelbrot 8000 > out.tiff; done
for i in {1..3}; do time java -Xmx100m org.bau.benchmarks.NBody; done
for i in {1..3}; do time java -Xmx2g org.bau.benchmarks.LinkedList; done
echo "== Nim ============"
echo "requires: nimble install https://github.com/nim-lang/bigints"
cp ../src/test/resources/org/bau/benchmarks/nim/* .
nim c -d:release binaryTrees.nim
nim c -d:release fannkuch.nim
nim c -d:release munchausen.nim
nim c -d:release piDigits.nim
nim c -d:release mandelbrot.nim
nim c -d:release nbody.nim
for i in {1..3}; do time ./binaryTrees 20; done
for i in {1..3}; do time ./fannkuch 11; done
for i in {1..3}; do time ./munchausen; done
for i in {1..3}; do time ./piDigits 10000 > out.txt; done
for i in {1..3}; do time ./mandelbrot 8000 > out.tiff; done
for i in {1..3}; do time ./nbody; done
echo "== Python via PyPy ============"
cp ../src/test/resources/org/bau/benchmarks/python/* .
for i in {1..3}; do time pypy3.10 binaryTrees.py 20; done
for i in {1..3}; do time pypy3.10 fannkuch.py 11; done
for i in {1..3}; do time pypy3.10 munchausen.py; done
for i in {1..3}; do time pypy3.10 piDigits.py 10000 > out.txt; done
for i in {1..3}; do time pypy3.10 mandelbrot.py 8000 > out.tiff; done
for i in {1..3}; do time pypy3.10 nbody.py; done
for i in {1..3}; do time pypy3.10 linkedList.py; done
echo "== Rust ============"
cp ../src/test/resources/org/bau/benchmarks/rust/*.rs .
rm -rf rust
mkdir -p rust
cp -R ../src/test/resources/org/bau/benchmarks/rust .
cd rust
cargo build --release
cd ..
rustc -C opt-level=3 binaryTrees.rs
rustc -C opt-level=3 fannkuch.rs
rustc -C opt-level=3 munchausen.rs
rustc -C opt-level=3 mandelbrot.rs
rustc -C opt-level=3 nbody.rs
rustc -C opt-level=3 linkedList.rs
rustc -C opt-level=3 virtualDispatch.rs
for i in {1..3}; do time ./binaryTrees 20; done
for i in {1..3}; do time ./fannkuch 11; done
for i in {1..3}; do time ./munchausen; done
for i in {1..3}; do time ./rust/target/release/pi_digits > out.txt; done
for i in {1..3}; do time ./mandelbrot 8000 > out.tiff; done
for i in {1..3}; do time ./nbody; done
for i in {1..3}; do time ./linkedList; done
for i in {1..3}; do time ./virtualDispatch; done
echo "== Swift ============"
cp ../src/test/resources/org/bau/benchmarks/swift/*.swift .
mkdir -p swift
cp -R ../src/test/resources/org/bau/benchmarks/swift .
cd swift/piDigits
swift build -c release
cp .build/arm64-apple-macosx/release/piDigits ../..
cd ../..
swiftc -O binaryTrees.swift -o binaryTrees
swiftc -O fannkuch.swift -o fannkuch
swiftc -O munchausen.swift -o munchausen
swiftc -O mandelbrot.swift -o mandelbrot
swiftc -O nbody.swift -o nbody
for i in {1..3}; do time ./binaryTrees 20; done
for i in {1..3}; do time ./fannkuch 11; done
for i in {1..3}; do time ./munchausen; done
for i in {1..3}; do time ./piDigits 10000 > out.txt; done
for i in {1..3}; do time ./mandelbrot 8000 > out.tiff; done
for i in {1..3}; do time ./nbody; done
echo "== Vlang ============"
cp ../src/test/resources/org/bau/benchmarks/vlang/* .
v -prod -force-bounds-checking binaryTrees.v
v -prod -force-bounds-checking fannkuch.v
v -prod -force-bounds-checking munchausen.v
v -prod -force-bounds-checking -enable-globals piDigits.v
v -prod -force-bounds-checking mandelbrot.v
v -prod -force-bounds-checking nbody.v
for i in {1..3}; do time ./binaryTrees 20; done
for i in {1..3}; do time ./fannkuch 11; done
for i in {1..3}; do time ./munchausen; done
for i in {1..3}; do time ./piDigits > out.txt; done
for i in {1..3}; do time ./mandelbrot 8000 > out.tiff; done
for i in {1..3}; do time ./nbody; done
echo "== Zig ============"
cp ../src/test/resources/org/bau/benchmarks/zig/* .
zig build-exe -O ReleaseSafe binaryTrees.zig
zig build-exe -O ReleaseSafe fannkuch.zig
zig build-exe -O ReleaseSafe munchausen.zig
zig build-exe -O ReleaseSafe piDigits.zig
zig build-exe -O ReleaseSafe mandelbrot.zig
zig build-exe -O ReleaseSafe nbody.zig
for i in {1..3}; do time ./binaryTrees 20; done
for i in {1..3}; do time ./fannkuch 11; done
for i in {1..3}; do time ./munchausen; done
for i in {1..3}; do time ./piDigits > out.txt; done
for i in {1..3}; do time ./mandelbrot 8000 > out.tiff; done
for i in {1..3}; do time ./nbody; done
echo "== Python (CPython) ============"
cp ../src/test/resources/org/bau/benchmarks/python/* .
for i in {1..3}; do time python3 binaryTrees.py 20; done
for i in {1..3}; do time python3 fannkuch.py 11; done
for i in {1..3}; do time python3 piDigits.py 10000 > out.txt; done
for i in {1..3}; do time python3 munchausen.py; done
for i in {1..3}; do time python3 mandelbrot.py 8000 > out.tiff; done
for i in {1..3}; do time python3 nbody.py; done
cd ..