A Microbenchmark Suite for WebAssembly Function Calls
October 24, 2022 ยท View on GitHub
This collection of files constitute the code that was used to evaluate the performance of WebAssembly and related systems as presented in the Oct. 11 '22 WebAssembly CG Meeting so that interested parties may replicate the results on their own machines. They essentially comprise many slight variations on how to implement the same high-level program, varying the backend target, the low-level implementation strategy and degree of optimization, and the compilation and linking configuration. As such, although technically this is a suite of microbenchmarks, it is the performance of these variations relative to each other that is primarily interesting.
WebAssembly (and JavaScript)
The *.wat files provide the hand-written WebAssembly code for the various configurations.
The whole-*.wat files implement the program as a single module.
whole-vtable.watis arguably the "straighforward" implementation of the benchmark. Class methods are implemented by usingcall_indirectwith the index computed by adding the method's offset within the vtable to the object's class's index within thefuncreftable.whole-switch.watinstead implements class methods by making a direct call to a function thatbr_tables on the object's class identifier, branching to the respective class's implementation of the method.whole-unguarded.watexploits whole-program reasoning to eliminate allbr_tables. In the case ofisNil, the method can instead be implemented by comparing object's class identifier with the identifier forNil. In the other cases, the respective method is only ever called on instances ofCons.whole-optimal.watforgoes function calls and branches therein entirely by completely inliningwhole-unguarded.wat, providing the "optimal" solution that serves as a baseline that can be used on all WebAssembly engines.
The list-*.wat and sort-*.wat files implement the program as two separate modules (the producer of the list versus the consumer of the list), which are then linked during instantiation.
list-direct-switchandsort-direct.watis the obvious splitting ofwhole-switch.wat.list-direct-unguarded.watissort-direct.watare the obvious splitting ofwhole-unguarded.wat.list-vtable.watandsort-vtable-constant.watis the obvious splitting ofwhole-vtable.way.list-vtable.watandsort-vtable-extern.watfurthermore incorporates the fact that often the v-table offsets of code consuming a class are not known until link time, using imported globals rather than fixedi32.constto compute the offsets forcall_indirect.
The WebAssembly numbers presented were generated by opening sort.html in the respective browsers, selecting the corresponding benchmark, clicking Run (reloading the page between runs), and selecting the median value (rounded to 10ms).
sort.html was itself generated by executing make html, which inserts the base64 string encodings of the respective wat benchmarks, as well as the JavaScript implementation in Sort.js.
One can also execute make wasm to generate the respective wasm files, and/or make b64 to generate the B64 files with their respective base64 string encodings.
The Test button in sort.html can be used to check their correctness by comparing the output with expected-output.txt.
Native
The *.c and *.h files provide the native analogs to the above WebAssembly files.
The whole-*.c files are analogous to the corresponding whole-*.wat files.
They were compiled with gcc -O1.
The list-direct-*.c/sort-direct.c (and list-direct.h) files are analagous to the corresponding list-direct-*.wat/sort-direct.wat files.
The list-vtable-*.c/sort-vtable-*.c (and list-vtable-*.h) files are analagous to the corresponding list-vtable.wat/sort-vtable-*.wat files.
They were compiled to separate object files with gcc -O1 -fno-lto and then linked with gcc -fno-lto.
One can also compile them using make native.
One can run them without arguments to collect times, or one can run them with --test and compare the output with expected-output.txt to check their correctness.
Alternatively, one can use wasm run-native and wasm test-native to respectively (compile and) collect times or check correctness of all the native benchmarks.
Java
The Sort.java file implements the program in Java using interfaces rather than (abstract) classes.
One can compile this file using javac Sort.java and then run it (or check its correctness) using java Sort (or java Sort --test).
Alternatively, one can use wasm java, wasm run-java, and wasm test-java.
Miscellaneous
Use make all to generate sort.html and all *.wasm, *.B64, *.class, and executable files.
Use make clean to remove these files (including the pregenerated sort.html included with this bundle), as well as intermediately generated sort.html.bkp and *.o files that might have left if an earlier make was interrupted.
Acknowledgements
Thanks to Derek Schuff (@dschuff) and Petr Penzin (@penzn) for reviewing the code to confirm that the comparisons are fair!