A Microbenchmark Suite for WebAssembly Function Calls

October 24, 2022 ยท View on GitHub

This collection of files constitute the code that was used to evaluate the performance of WebAssembly and related systems as presented in the Oct. 11 '22 WebAssembly CG Meeting so that interested parties may replicate the results on their own machines. They essentially comprise many slight variations on how to implement the same high-level program, varying the backend target, the low-level implementation strategy and degree of optimization, and the compilation and linking configuration. As such, although technically this is a suite of microbenchmarks, it is the performance of these variations relative to each other that is primarily interesting.

WebAssembly (and JavaScript)

The *.wat files provide the hand-written WebAssembly code for the various configurations.

The whole-*.wat files implement the program as a single module.

  • whole-vtable.wat is arguably the "straighforward" implementation of the benchmark. Class methods are implemented by using call_indirect with the index computed by adding the method's offset within the vtable to the object's class's index within the funcref table.
  • whole-switch.wat instead implements class methods by making a direct call to a function that br_tables on the object's class identifier, branching to the respective class's implementation of the method.
  • whole-unguarded.wat exploits whole-program reasoning to eliminate all br_tables. In the case of isNil, the method can instead be implemented by comparing object's class identifier with the identifier for Nil. In the other cases, the respective method is only ever called on instances of Cons.
  • whole-optimal.wat forgoes function calls and branches therein entirely by completely inlining whole-unguarded.wat, providing the "optimal" solution that serves as a baseline that can be used on all WebAssembly engines.

The list-*.wat and sort-*.wat files implement the program as two separate modules (the producer of the list versus the consumer of the list), which are then linked during instantiation.

  • list-direct-switch and sort-direct.wat is the obvious splitting of whole-switch.wat.
  • list-direct-unguarded.wat is sort-direct.wat are the obvious splitting of whole-unguarded.wat.
  • list-vtable.wat and sort-vtable-constant.wat is the obvious splitting of whole-vtable.way.
  • list-vtable.wat and sort-vtable-extern.wat furthermore incorporates the fact that often the v-table offsets of code consuming a class are not known until link time, using imported globals rather than fixed i32.const to compute the offsets for call_indirect.

The WebAssembly numbers presented were generated by opening sort.html in the respective browsers, selecting the corresponding benchmark, clicking Run (reloading the page between runs), and selecting the median value (rounded to 10ms). sort.html was itself generated by executing make html, which inserts the base64 string encodings of the respective wat benchmarks, as well as the JavaScript implementation in Sort.js. One can also execute make wasm to generate the respective wasm files, and/or make b64 to generate the B64 files with their respective base64 string encodings. The Test button in sort.html can be used to check their correctness by comparing the output with expected-output.txt.

Native

The *.c and *.h files provide the native analogs to the above WebAssembly files.

The whole-*.c files are analogous to the corresponding whole-*.wat files. They were compiled with gcc -O1.

The list-direct-*.c/sort-direct.c (and list-direct.h) files are analagous to the corresponding list-direct-*.wat/sort-direct.wat files. The list-vtable-*.c/sort-vtable-*.c (and list-vtable-*.h) files are analagous to the corresponding list-vtable.wat/sort-vtable-*.wat files. They were compiled to separate object files with gcc -O1 -fno-lto and then linked with gcc -fno-lto.

One can also compile them using make native. One can run them without arguments to collect times, or one can run them with --test and compare the output with expected-output.txt to check their correctness. Alternatively, one can use wasm run-native and wasm test-native to respectively (compile and) collect times or check correctness of all the native benchmarks.

Java

The Sort.java file implements the program in Java using interfaces rather than (abstract) classes. One can compile this file using javac Sort.java and then run it (or check its correctness) using java Sort (or java Sort --test). Alternatively, one can use wasm java, wasm run-java, and wasm test-java.

Miscellaneous

Use make all to generate sort.html and all *.wasm, *.B64, *.class, and executable files. Use make clean to remove these files (including the pregenerated sort.html included with this bundle), as well as intermediately generated sort.html.bkp and *.o files that might have left if an earlier make was interrupted.

Acknowledgements

Thanks to Derek Schuff (@dschuff) and Petr Penzin (@penzn) for reviewing the code to confirm that the comparisons are fair!