CG-01-30.md

February 27, 2024 · View on GitHub

WebAssembly logo

Agenda for the January 30th video call of WebAssembly's Community Group

Where: zoom.us
When: January 30th, 5pm-6pm UTC (January 30th, 9am-10am Pacific Time)
Location: link on calendar invite

Registration

None required if you've attended before. Send an email to the acting WebAssembly CG chair to sign up if it's your first time. The meeting is open to CG members only.

Logistics

The meeting will be on a zoom.us video conference. Installation is required, see the calendar invite.

Agenda items

Opening, welcome and roll call
1. Opening of the meeting
2. Introduction of attendees
Find volunteers for note taking (acting chair to volunteer)
Proposals and discussions
1. Resume vote for creation of WebAssembly Benchmarking Subgroup (10 mins; Ben Titzer / Peter Penzin)
  1. Proposed charter: https://github.com/WebAssembly/meetings/pull/1455
2. FP16 proposal introduction (30 mins)
3. Correct Compilation to WebAssembly (20 mins; Ross Tate)
Closure

Agenda items for future meetings

None

Meeting Notes

Attendees

Paul Dennis
Yuri Iozzelli
Petr Penzin
Deepti Gandluri
Derek Schuff
Ashley Nelson
Daniel Hillerström
Heejin Ahn
Ilya Rezvov
Robin Freyler
Keith Winstein
Francis McCAbe
Conrad Watt
Paolo Severini
Nuno Pereira
Behnaz Pourmohseni
Fedor Smirnov
Abrown
Ryan Hunt
Zalim Bashorov
Jeff Charles
Paul Shoenfelder
Michael Ficarra
Dan Phillips
Benjamin Titzer
Emily Ruppel
Garret Gu
Jakob Kummerow
Daniel Lehman
Mathew Yacobucci
Justin Michaud
Oscar Spencer
Sean Jensen-Grey
Mingqui Sun
Thomas Lively
Kevin Moore
Ross Tate
Manos Koukoutos
Yury Delendik
Emanuel Ziegler
Chris Woods
Alon Zakai
Johnnie Birch

Proposals and Discussions

Resume vote for creation of WebAssembly Benchmarking Subgroup, Ben Titzer / Petr Penzin

Proposed charter: https://github.com/WebAssembly/meetings/pull/1455

PP presenting

PP: All review comments addressed, how should we move forward with the vote?

CW: We should take a unanimous consensus vote if no one objects

BT: We could go through the charter. there’s a slide deck, presenter couldn’t make it

https://docs.google.com/presentation/d/1sRqdavuuMU5sMwESmcHou4_oZih_SzrIZvhbEhDuWKA/edit#slide=id.p

BT presenting

Deliverables are fully enumerated in the charter in the PR, this slide is condensed.

SJG: this is great, the charter is small enough to make progress and be inclusive of what we want to do. Where do you see this going in the near future?

PP: I think we ‘re just going to start with regular meetings, we’ll be collecting some info on the status of tools & meetings - it’s definitely interesting to track some of the performance data. It’s similar to any subgroup, we have ongoing discussions and some things can come out of it, its not specifically tied to the deliverables

CW: we’re running short of time, if there are no objections, we should take the poll and take extended discussion offline.

SJG: It doesn’t sound like you’ll be working on canonical Wasm benchmarks, it’s not going to be producing “the” standard benchmarks for WebAssembly

BT: right.

CW: Unanimous consent poll for creation of the Benchmarking subgroup Petr Penzin And Saul Cabrera as co-chairs. Please speak up if anyone has concerns.

Consent poll passes.

FP16 proposal introduction, Ilya Rezvov

IR presenting: https://docs.google.com/presentation/d/1XmpxUW8syWQuWU-zvuyBhCZE6fJ1hWCQo_rcEkuQsnQ/edit#slide=id.p

PP: what happens when you have a CPU that only supports load/store? What’s the overhead?

IR: We expect software emulation will be used, so pretty slow - you also have the Nan propagation - we know who uses this - we only have one use case for F32, but does that apply to F16?

PP: wrt NaN propagation: we have the rules in the spec: we know who uses this for f32/f64 scalar, i’m not sure the use case extends to SIMD, we dont know of any.

IR: initially it will be emulated. If someone wants to use F16 number, for now it would be a userland library, and I think it can still perform better than emulation in user space, even for vectors. For NaNs, I agree; there are 2 opposing forces: one is consistency with other parts of the spec, the other is more efficient on existing hardware. But I don’t know of any actual need for this logic for F16, we could drop it,

CW: What is the bit pattern you’re suggesting in terms of relaxing the FP16 semantics

PP: probably just the bit patterns, to have a special canonical number. Probably IEEE.

KW [chat]: Could you speak to the determinism of the proposed semantics? Will it be similar to existing f32 and SIMD ops, where the only nondeterminism is in the binary representation of NaN? Will f16 necessarily mean IEEE binary16 (and not bfloat16 or some other f16 format)?

IR: we can do determinism because there’s no divergence yet between ARM and Intel implementations yet. The only open question is NaNs. I think we can get the nto the same umbrella as other FP numbers, and maybe have 2 profiles for deterministic and nondeterministic representation.

PP[chat]: FP16 is IEEE FP16 :)

CW: we’ll want to get Andreas and others involved, they’ll have opinions

IR: On the list as well, and potentially there will be more proposals to extend BF16. I could imagine it is more controversial. Definitely demand but still need more research to bring to Wasm, but it’s coming.

DG: wanted to mention, FP16 is IEEE FP16, I do prefer to make sure that we preserve those IEEE semantics across the proposal. The other caveat is that we experiment and get data before we make the actual decisions there. Also, when you said this proposal would include some relaxed SIMD ops, is that just FMA, do expect deterministic semantics for that, or relaxed? (e.g. hardware vs software fallback)

FM[chat]: is there also a case for adding i16. This may be important for managed memory languages.

IR: Yeah, as I said we have opportunity to have deterministic semantic for FMA right now. It could be less non-deterministic here.

DG: to clarify, you mean we’d only offer hardware FMA and not the relaxed multiply-addlike the relaxed SIMD proposal?

IR: Yeah

CW: I’m scared of that and want to know better but we don’t need to dig into the details right now.

RF[chat]: Could we have F16 SIMD shapes without having an F16 scalar type? Similar to i8 and i16 SIMD shapes without i8 and i16 scalar types in the spec so far.

IR: chat: do we need F16 scalar: good question. Generally we’ll need all the machinery for scalar values anyway for implementation, I can imagine it will be convenient to have scalar representations for the lanes etc. we could use F32 but there is no hardware analog of i16, but there is F16 hardware support, so it makes sense to have scalar type in wasm as well

CW: IIUC one of the complications is having i16 in the SIMD can be apprixmated by i32 Scalars just by taking the lower half, but fp16 isn’t the lower half of fp32. Is that correct?

IR: yes e.g. you can’t easily extract a lane and the precision isn’t the same is just half an F32

PP: BF16, is half

PP[chat]: bfloat16 can be partially implemented by targeting fp32 ops, at expense of memory usage for situations when it is possible to take half of a regular float, obviously not bfloat16x8

MF[chat]: FYI the JavaScript Float16Array proposal uses IEEE binary16 and it'd be pretty weird for wasm's f16 to not match

CW: Comment in chat from MF “FYI the JavaScript Float16Array proposal uses IEEE binary16 and it'd be pretty weird for wasm's f16 to not match”

IR: yes, the proposal currently matches JS using IEEE754-binary16 (Opcode space and binary format slide)

BT: IS there a minimal subset of just adding a scalar subset, and a load/store conversions between Fp32 & FP16 - we could introduce just the conversions and not the arithmetic ops? Some operations like addition and Subtraction would have rounding that can be done in the user space

IR: Assume any engine would be able to not be able to extend is lazily to f32 and then perform all operations natively, is that correct?

BT: My understanding is some operations like ….? Can be rounded and back Ben Titzer FIX ME. Frightens me that every engine can do that. Wondering if there is a subset of this proposal that gives the engines software emulation above Wasm.

IR: you’re proposing only a scalar representation with no vector operations?

BT: Not sure which 2 I would prefer but seeing if there is some subset of this proposal that makes sense.

IR: Personally I’d prefer the scalar operations for consistency, for when you want to just load/store having promotion and demotion would be sufficient - It’s not much of a burden to implement it especially if we want to add them

CW: i think as the proposal evolves we should have a document saying what the emulation should be, like we did for relaxed SIMD. Not a requirement for phase 1. I think it’s reasonable to do a phase 1 poll (previous convention was we don’t need far advance notice for phase 1). I would propose to do it by unanimous consent. Concretely shoudl we move F16 to phase 1. Are there any objections or does anyone want to move to a full poll?

PD[chat]: the rounding variants proposal is planning to use 0cFC instructions as well: https://github.com/WebAssembly/rounding-mode-control/blob/main/proposals/rounding-mode-control/Overview.md

JK [chat]: the binary encoding is an implementation detail that is highly bikesheddable and can be decided late in the overall process. FWIW, SIMD/relaxed-SIMD already goes all the way up to 0xfd113 (if I didn't miss any), so continuing there (in 0xfd...) with another couple dozen instructions is likely also a viable option.

CW: We can bikeshed on that.

PP: Want to see how slow this would be on emulation without hardware data?

IR: my initial implemetatino will probably be with emulation, so we’ll see.

CW: I don’t think that needs to be a phase 1 requirement, are you still OK with that?

PP: Depends on what we think is Phase 1?

CW: I think we’ve cleared the bar of whether the proposal is in scope and worth investigating.

AZ [chat]: Just a thought, for the few f16 operations that can't be done using f32 promotion/demotion (iirc, sqrt fits that for f32/f64, but not sure about f16/f32), allowing some nondeterminism is another option. That is, to allow the result that is computed in f32 as well, in the spec.

MF [chat]: I think the scalar portion is much less motivated than the vector portion of the proposal, and it probably shouldn't occupy much of the single-byte opcode space, but no objection to phase 1

Correct Compilation to WebAssembly, Ross Tate

RT presenting Slides

CW: at the wasm language level, if something’s not exported it can’t be accessed, but we’re not mitigating logic bugs which might cause unexported memory or functions to be accessed.

RT: I would guess that we agree that you should be able to rely on imports not accessing unexported things.

CW: should only hold so long as linked modules conform to the ABI the program was compiled against.

RT: How would they access memory that they don’t have access to?

CW: if you link a hostile module that could forge pointers in linear memory you’d break the semantics

RT: In this example, the imports/exports don’t have addresses to memory unless they are forged somehow

CW: maybe i’m not fully anticipating your example.

RT: it’s fine these are good questions. I’ve set up tihs example such that there aren’t any of these pointers, so in this setting you should be able to assume there’s no forging

CW: If you have a module that doesn’t export a function that doesn’t do anything weird with exports, that could work, but maybe that is still going to come up

RT: so this program is correct and not doing anything weird, and i’m trying to make it so no other programs can make it do something weird. Here it’s possible to make the program launch missile, not by messing with its memory but with its control

(slide: launching missiles)

CW: You have to be cautious about the analogy here, you’ve added JS here which is in some way forging the pointer,

RT: JS isn’t forging a pointer.

CW: You have compiled Wasm and taking an i32 and Wasm interprets that as a pointer, right?

RT: If the Wasm program was to catch the trap, then it wouldn’t be a problem, if you unwind the stack then it’s no problem. The only privilege that JS has here is that it can catch the trap.

CW: Isn’t the point that you can call bar with an arbitrary input?

RT: bar is callable with arbitrary input in the C program but still won’t have this problem.

BT: The issue is that Wasm allows modules to be reentrant in a way that other embeddings don’t, and because of that if you unwind the Wasm stack, a virtualized thing, if there is state aligned like a shadow stack, because you cant catch traps, you cant fix its state properly. Because the host module can, you’ll be in an invalid state.

RT: yes, thanks. To clarify, this C program can handle reentrancy. The issue that the C runtime relies on certain assumptions that would be tru other places but not in wasm with traps. So we can use these “mismatches” to make wasm programs do things they shouldn’t be able to. It’s hard to directly compile runtimes to wasm because wasm is able to do things these runtimes need to be able to manage.

Not proposing this now, but a way to fix this would be to enable catching traps, or catchines all exceptions and traps, and allow unwinding from both cases. (slide: fixing traps)

AZ [chat]: +1 to Ben's point. In practice in toolchains JS would be responsible for unwinding the shadow stack as well, to avoid this problem. Otherwise it is breaking the wasm program's ABI.

BT [via chat]: Right, I see this as a problem with the host environment reentering Wasm, and this is fixable at the host boundary to reestablish the ABI (e.g. resetting the shadow stack)

PP [chat]: Wouldn't this require standardizing how shadow stack works? +1 on both of these points

DS: Interesting property of the problem is that this is the interaction between the program, and the host environment that causes this problem. Interfacing with JS, you have reentrancy after traps, and maybe other unexpected control flow (JS exceptions unwinding through C frames) that causes this problem. With the exceptions proposals, we’re allowing the importing the JS exception type, which doesn’t completely mitigate the problem (doesn’t deal with traps), but helps somewhat, giving programs a way to mitigate the JS exception part. I’m not sure I’m sold on giving this as a guarantee for every embedding - we can’t guarantee what the embedder provides, they all at least have access to memory

CW: Kind of like the more intricate version of the obvious problem at the source level, even if you look at your program and have certain guarantees, at the Wasm level, maybe the memory is just exported and JS can inspect it. This is playing with the version of maybe you have problems even if you don’t export the memory. We already kind of accept that as something possible to happen.

AZ [chat]: The wasm and the JS need to agree on the shadow stack, and other ABI details, yes. And they do, as documented in the tool-conventions repo and as implemented in LLVM, Emscripten, and others. It is easy to get wrong, though!

RT: High level thing to ponder: You have to trust the embedder, what standards of embedders could we provide so that we could have stronger compilation guarantees. My compiler for example has this guarantee that my compiler won’t launch missiles for example, the embedder should be able to guarantee some things..

CW: interesting, i think it depends on who “you” is in this scenario. If you take someone else’s wasm module and want to mess with it, you have a lot of power when you instantiate it and what you link it with, you could at least get its memory probably. It’s hard to provide guarantees when you have this capability.

PP: The problem is not that someone can maliciously do something, but there is room for accidentally hitting something.

RT: in security there’s the notion of trusted computing base. Ideally for my program, this is the only program i give missile capability to and i want it to be the case that this is the only program I have to trust, I don’t want to rely on a big expanded trusted computing base, i only want this one program that I’ve analyzed thoroughly

BT: I get where you’re going - the program has a source level guarantee - that’s hard top guarantee - instead you want to specify what the reentrancy model of your program is - you have to stick to what you can specify in terms of guarantees.

RT: In order to compile this correctly, I should anticipate that the shadow stack would be exposed…

BT: Yes

CW: On the web, this is the wrong way to think about this - the person who cares about the TCB is the person deploying the final blob of wasm and JS - if a separate person compiles the Wasm module, they can’t protect against the deployer messing with it