CG-08-01.md

September 13, 2023 · View on GitHub

WebAssembly logo

Agenda for the August 1st video call of WebAssembly's Community Group

  • Where: zoom.us
  • When: August 1st, 4pm-5pm UTC (August 1st, 9am-10am Pacific Time)
  • Location: link on calendar invite

Registration

None required if you've attended before. Send an email to the acting WebAssembly CG chair to sign up if it's your first time. The meeting is open to CG members only.

Logistics

The meeting will be on a zoom.us video conference. Installation is required, see the calendar invite.

Agenda items

  1. Opening, welcome and roll call
    1. Opening of the meeting
    2. Introduction of attendees
  2. Find volunteers for note taking (acting chair to volunteer)
  3. Proposals and discussions
    1. Proposal update: Exception handling (https://github.com/WebAssembly/exception-handling/issues/280), slides [30 mins]
    2. Follow-up on flexible vectors presentation (https://github.com/WebAssembly/flexible-vectors/issues/60), slides [Petr Penzin, 30 mins]
  4. Closure

Agenda items for future meetings

None

Schedule constraints

None

Meeting Notes

Attendees

  • Ashley Nelson
  • Conrad Watt
  • Thomas Lively
  • Ben Titzer
  • Francis McCabe
  • Bruce He
  • Nuno Pereira
  • Deepti Gandluri
  • Ryan Hunt
  • Alon Zakai
  • Bailey
  • Zalim Bashorov
  • Keith Winstein
  • Yuri Iozzelli
  • Tal Garfinkel
  • Chris Woods
  • Paolo Severini
  • Yury Delendik
  • Shravan Narayan
  • Jeff Charles
  • Manos Kouboutos
  • Brendan Dahl
  • Don Gohman
  • Chris Woods
  • Andrew Brown
  • Kenneth Brooks
  • Mingqiu Sun
  • Petr Penzin
  • Nick Fitzgerald
  • Jakob Kummerow
  • Emmanuel Ziegler
  • Luke Wagner
  • Justin Michaud
  • Sean Jensen-Grey
  • Sam Clegg
  • Ioanna Dimitriou
  • Ilya Rezvov
  • Heejin Ahn
  • Emanuel Ziegler
  • Dan Gohman
  • Jonnie Birch
  • Pat Hickey
  • Thomas Steiner
  • Nick Ruff

Meeting notes

Proposal update: Exception handling [30 mins]

BT presenting “Revisiting exnref” slides

DanG (via chat): How did we end up with a phase 3 proposal that we can’t change without breaking the Web?

CW (via chat): Exception handling has been unflagged in (some?) browsers for a while. I agree this isn’t a situation we want to repeat for future proposals.

FM: Helpful if you went over the A&B proposals

BT: Heejin?

CW: Only if comfortable and something prepared

HA: Option A is intended to be similar to current proposal the intention is to minimize the friction of implementation in both the toolchains and engines. And basically it looks like the current proposal the only difference is the catch only pushes instructed values it pushes additionally an exnref. It basically pushes multiple values. For catch_all it only pushes exnref. SO if tag 0 is i32 it pushes the multivalue i32 and the exnref. And if tag consists of two values, like i32 and f32, it pushes i32 and f32 and exnref. Meant to be the minimal delta from the current proposal. The advantage is its minimal delta so we expect less friction or disruption from the toolchain. We care about some amount of disruption but we have to plan accordingly and provide necessary tools for users that don’t want to recompile, we can provide some sort of binary translator. Option B is suggested by AR. The opcode doesn’t have meanings. With the existing proposal we can only rethrow between catch and the end opcode but now we don’t have this lexical rethrow. Catch takes multiple tag label pair so that when the exception is caught by tag 0, you go to label 0 with the exnref value. If exception is caught by tag 1 then you go to label 1 with the value. Block 1 and Block 0, those things should have the return value as exnref. Optionally, only a catch all label. If this exception is not going to be caught by the listed tags, but we still want to catch_all, we should go to this label. This returns exnref for the block, so we need additional br_on_exn. This should be familiar because we have this in the previous proposal. This is for extraction. try/catch is only for jumping. This option is more like the current wasm but it doesn’t have the two-part block structure like try/catch. Does not have an end. May be aesthetically more pleasing. It can involve more code size. In my preliminary experimentation, many applications in the wild, like C++, use very few try/catch and a lot more try/catch_all. The code size is not that different from in the wild applications.

ID (via chat): This ‘catch’ sounds like the way it’s implemented in firefox

BT: In both options, both catch flavors are new opcodes, so they don’t overlap with existing binary space. Engines must support the previous bytecodes, and they don’t have to change new bytecodes. This catch that pushes the values and the extended exnref, it is also helpful to have the old catch that did not catch the exnref. Because we can completely elide the exnref. So you can effectively throw values from one frame to another without having intermediate storage.

CW: Can you talk about what you expect the toolchain migration story to be, not just engines. Things like Binaryen, for example.

HA: We also need to have some phase that we support both options. New implementations might have some bugs, and we need a beta testing phase with partners. After testing one option for some time, we can decide to make a switch to it, and if we agree to deprecate it, we can agree to deprecate the current option.

RH: Note about deprecation because that’s a key point. My perspective representing Firefox/SpiderMonkey is we are open to try and doing the deprecation but the thing we want to make sure is thought through is that we have two kinds of exception handling, because we’re not able to deprecate the old thing, then that is a worse outcome than having just the old one. We can’t predict if we can successfully deprecate everything, because we can’t control the users on the web. From some discussions it sounds like we know enough of the users and can communicate to the users that we can make changes quickly and have deprecation on some timeline. But the key thing is to hear from people, are there any toolchains we don’t know about using the EH feature? What unknown unknowns are out there that could handicap us from deprecating down the road? It’s the biggest concern as a browser vendor.

KW (via chat): (I don't quite follow how Option A would enable an asyncify -- how does having the opaque exnref help rewind the stack back to inside a particular catch block?)

HA: It can’t. I think you’re talking about the current option, so rethrow depends on current surroundings, and there is an opaque storage for exnref. One motivation for reintroducing it is to make the it work. Now rethrow is going to take the exnref as argument and does not depend on the surroudings. Option A & B can support asyncify.

BT: Put it this way, exnref addresses the dataflow part of the asyncify because it has a handle to the exception package you need to be able to store somewhere, whereas Phase 3 there is no dataflow possible, so the control flow rewrite that gets you back in a try/catch gets you a normal CPS transform that you would need to do. It would be like any operand value that is on the stack, do the CPS transform to pass the values forward or store them in a shadow stack.

KW: Rethrow would take an externref from the stack instead?

HA: Should have included in the slide as well, I was summarizing too much.

CW: That’s a change common to both options, rethrow will be changed to take a first-class exception package. How do we decide between option A & B now that they are raised?

RH: Option A or B does nothing cause that’s the other one here.

Chris Woods: What’s this giving us again? Is this just allowing us to receive exceptions from the host environment in an easier way?

BT: This slide simplifies the interaction with the host environment, in particular JS, adn the CPS thing, Asyncify was not possible before so that’s a new capability and generally it simplifies things in both engines and the spec.

CW: My understanding is that both would mean we won’t have to duplicate finally blocks.

BT: Indeed that falls under the unnecessary duplication in that point here.

Chris Woods: In the languages we are looking at (C family), we are implementing phase 3 support in WAMR. The spec is elegant, but the compiler generated code doesn’t use half of it. This is great, but why would we implement it if we only see 1 throw and 1 catch. I don’t have visibility in other languages or toolchains, but I see this is an interesting addition, but I need to think about it, but I don’t know what the impact it is in the embedded C environment.

HA: What toolchain are you developing?

Chris Woods: Runtime support for WAMR for phase 3

HA: Is either Option A or Option B going to be a problem for the implementation changes in WAMR?

Chris Woods: Technically no, because we only just started rolling out phase 3 support this month with a branch. We can change it, but looking at pragmatic, what am I going to get out of it when I’m looking at an environment where the toolchain doesn’t use the full spec right now?

DG: Unfortunately we are out of time for this discussion item. We have capped it at 30 minutes).

CW: Follow up in the offline issue DG posted in chat

DG (via chat): Github issue for offline follow up: https://github.com/WebAssembly/exception-handling/issues/280

TL (via chat): Chris, I assume the toolchain you're talking about is clang? If so, you should chat more with Heejin because she implemented the EH use in LLVM.

ChrisW (via chat): 100% - thank you. I do admit my lack of knowledge here, and I'm keen to understand some more.

HA (via chat):@Chris Woods I'm not sure which toolchain you were referring to, but our current LLVM toolchain uses rethrow. I mean our C++ support. (We use it when, when we have 'catch (int)', but the thrown exception is not an int, to rethrow to it to the next enclosing catch)

Chris Woods (via chat): Then I am corrected. Thank you. We've only profiled simple code generation from clang, and it was some time ago, basically just trying to understand what is generated. We did not look at the toolchain code itself, only what we saw being generated by the tool chain. So a superficial level, admittedly. We saw, throw tag 0, catch tag 0, and some handling... so the full nuanced and elegant instruction set didn't really seem to be completely used... which is a shame, cos it's cool. But also, I wonder if this gives rise to understanding the impact of a spec change on existing languages.

BT (via chat): FWIW, I implemented the option A in wizard’s interpreter, and it was a nice simplification. The lexical rethrow in Phase 3 is kind of a pain for interpreters. Andreas also implemented option A in the reference interpreter.

HA (via chat): For toolchains, I don't have a full knowledge on how many (or if any) existing toolchains have implemented EH out there, so I'd like their inputs as well. I don't think the rethrow change is gonna be a big change; it can boil down to just storing it to a local and retrieve it later for rethrowing. I think try~catch side (especially in case of Option B) can be a bigger change for toolchains, if we decide to go down that path. (I heard Blazor was implementing EH recently)

ID (via chat): It sounds like the 3rd proposal’s rethrow is the big problem for async and CPS transformation, is this right?

BT (via chat): Correct

ID (via chat): Makes sense.

BT: Execution summary, IMO, is that exnref for WAMR means a ref-counted exception package

KW (via chat): Preliminarily, both of these options seem (?) to make things more challenging for wasm2c. Right now it never has to malloc or provide GCed or refcounted data, since the scope of a caught exception is purely lexical. I wonder if there is a possible Option C that keeps the status quo intact, but adds an additional pair of instructions (or standardized imported functions!). One that gets the current caught exception and returns an externref, and one that rethrows that externref.

CW (via chat): does this idea make things easier for Wasm2C? Would you not still need to malloc/collect the externref?

KW (via chat): @Conrad: I think one benefit is that I was thinking "Option C" could be a totally new proposal (enhanced EH) that layers nicely on top of the existing EH, so the impact would be limited to modules that actually use the feature.

HA (via chat): So you mean wasm2c can support only a part of the EH instructions?

KW (via chat): @Heejin: I believe wasm2c currently implements the entire current EH proposal. But adding an exnref with indefinite scope (and referring to data of indefinite size) would make things harder for us and seems to introduce a dependency on some sort of GC.

BT (via chat): Does wasm2c use setjmp/longjmp for EH?

KW (via chat): wasm2c uses goto for EH within the same function, and setjmp/longjmp when crossing a function boundary.

BT (via chat): Are the semantics of setjmp sufficient to guarantee wasm’s semantics?

KW (via chat): Re: semantics of setjmp, I believe so (when the enclosing try block is in a different function). But happy to discuss.

HA (via chat): If GC'ed, of refcounted data is problem for wasm2c, does it have plans to support GC?

BT (via chat): I was recently looking into setjmp more closely and I was a little surprised at how few guarantees it makes. We can maybe followup offline. IMO the issue comes down to lexical rethrow, which is the complicated thing in phase 3 that gives rise to the other problems

KW (via chat): But #2, perhaps to be easiest for wasm2c, the full API would be something like "get_current_caught_exn" [] -> externref, "rethrow_exn" externref -> [], and maybe "free_exnref" which frees it explicitly.

BT (via chat): Is rethrow used a lot in wasm2c use cases? (If not, then keeping the exnref-less catch should serve those use cases).

KW (via chat): I mean, the intention of wasm2c is to transpile any valid Wasm module into a C program that obeys the Wasm semantics. So if you imagine producers are going to use rethrow, we have to implement it.

BT (via chat): Sure, the question is always one of cost of runtime complexity and performance

Follow-up on flexible vectors presentation (https://github.com/WebAssembly/flexible-vectors/issues/60) [30 min]

CW: Our second item is a flexible vectors update given by Petr.

Petr presenting “Flexible vector use cases” slides

JM (via chat): Re: SIMD, I have heard from some users that they won't even consider supporting WASM for ML workloads because WebGPU is 10x faster, to the point where their model only works with WebGPU. My question: what use cases does this have that 1) widely produce a speedup, and 2) would not be even better if served by WebGPU?

SJ-G (via chat): Why increased SIMD with vs a vector ISA like RISC-V Vector extension?

PP: Well, if vector ISAs were commonly available, then we would use that. The proposal was originally modeled on RISC-V, but full flexible vector won't map very well to fixed-width SIMD without native mask support. Maybe the world eventually would move to vector ISAs, but that is not the case at the moment.

TG (via chat): WebGPU is not a solution for many uses of SIMD compression, media decoding, json decoding, etc.

DeeptiG (via chat): Re. Cryptography and Matrix multiply - can you talk about the tradeoffs of exposing a subset of AMX/AES instructions instead of a more general solution that might leave some performance on the table?

PP: WebGPU the problem of communication overhead with the device. For code that ultimately uses GPU it might be possible to do all processing on it, but for code that won't (let's say signal processing), the cost of round-tripping to the divice would be prohibitive. WebNN supports custom operators using Wasm, that would also not run on GPU.

DeeptiG: A big part of this is webgpu just shipped last month so everything using ML on the web right now is using the CPU, ther eare a bunch of operations that work better on the GPU and will shift in the future. But there is also stuff that is better for the CPU, memory intensive and shorter workloads. For Chrome we see some shifting to GPU but still a bunch needs the CPU

CW: If a platform wanted to be conservative, and reported everything as 128, how much of a penalty would it be if you used flexible vectors instead of the baseline SIMD opcodes?

PP: should be the same, I think Wasmtime experimented with this approach

CW: One thing thaht might be good to document is the mapping of instructions to older Wasm SIMD instructions similar to the way relaxed-simd does

PP: That’s a good idea, write it down in the issue.

CW: The most interesting situation is if one of the mappings isn’t obvious.

PP: If you try to go back and forth the mappings, especially for shuffles, and masks, I’m not sure what would be the resulting

Closure