GC Implementation Lowering Tips - WebAssembly/binaryen GitHub Wiki

The GC optimization cookbook has useful suggestions for how to optimize WasmGC code effectively using Binaryen, in terms of which passes and optimizations to run. This page focuses on how to emit efficient WasmGC code from a compiler that Binaryen can optimize well.

Note that some of the topics here may be relevant to non-GC languages as well (either because they use GC internally somehow, or if they happen to have patterns that are similar to those common in GC languages).

Class Initializations

In Java and other GC languages it is common to have "class init" code that needs to run exactly once at the time of the first use of a class. Binaryen has several optimizations that aim to remove redundant checks for initialization, the primary one of which is the OnceReduction pass, so it can be useful to emit code that fits into the patterns expected there, which means the following (but see that pass for more details):

Each "once" (executes once the first time it is called) function should have a corresponding global. The global starts at 0 and is set to 1 by the function.
The function checks the global at the top and immediately returns if it is already set.

Basically, that pattern looks like this:

(global $A.initialized (mut i32) (i32.const 0))

(func $A.initialize
  (if
    (global.get $A.initialized)
    (return)
  )
  (global.set $A.initialized
    (i32.const 1)
  )
  ;; .. initialization code ..
)

Types

In general, it makes sense to use more types where possible. For example, if you have two types in your source code, they should generally be lowered to two WasmGC types, as that information may be useful to optimize (perhaps Binaryen will see that one type's field is always written a particular value, which it can then infer). Running Binaryen's --type-merging pass will merge types where possible, which you can do at the very end to save code size, so you don't need to worry much about emitting "too many" types.

It also makes sense to refine types as much as possible. If you know that a field is immutable, for example, it is good to mark that field as such. Likewise, using the most refined type (non-nullable, if possible, and the most refined subtype) can help. The Binaryen optimizer can refine, but there are cases where it cannot prove properties that a source language can easily do so.

Vtables and ITables

The most optimizable form for such data structures is to put them in immutable globals with immutable fields, like this:

(struct $Foo (field $vtable (ref $vtable.Foo)) ..)

(struct $vtable.Foo (field $func1 (ref $funcTypeX)) (field $func2 (ref $funcTypeY)) ..)

(global $vtable.Foo
  (struct.new $vtable.Foo
    (ref.func $A)
    (ref.func $B)
    (ref.func $C)
  )
)

(struct.new $Foo
  (global.get $vtable.Foo)
)

Note how the vtable field on the object is immutable as well. In this form, the optimizer can easily see what is being called when it can infer enough about the type of a reference. That is, if we can infer that a reference points to (ref $Foo) above (and not a subtype or supertype) then if we call the second item from the vtable, we can optimize to a static call to function $B.

The optimizer will infer immutability when it sees no sets, but it is still good to emit fields as immutable yourself - that way you'd get an error if there is a bad set of the field.

The optimizer will refine fields when it can (depending on subtypes etc.), but like immutability it is best to use the most precise types you can. In the example above, the class has its own vtable type, for example, which has specific function types in it.

Boxing

The Binaryen optimizer help remove unnecessarily-boxed values in some situations, such as in Heap2Local which does escape analysis and replaces a GC allocation with locals. However, in general, boxing is something that the compiler to WasmGC needs to do effectively, because later optimizers (Binaryen and VMs) are limited in what they can do. To see why, first consider the simple situation:

(type $Box (struct (field i32)))

(type $Boxer (struct (field (ref $Box))))

;; All sets look like this:
(struct.set $Boxer
  (struct.new $Box
    ..
  )
)

;; All gets look like this:
(struct.get $Box 0
  (struct.get $Boxer 0
    ..
  )
)

Here every time we write to Boxer's field we box an integer. We could instead simply store the integer there:

(type $Boxer (struct (field i32)))

;; All sets look like this:
(struct.set $Boxer
  ..
)

;; All gets look like this:
(struct.get $Boxer 0
  ..
)

In general, however, we cannot do this. First, we'd have to see that only struct.new flows into Boxer's field, because otherwise there can be other references to the Box object, that potentially change the value of the field if it is mutable. But say that the field is immutable - we still may not wish to optimize here, because if we don't see struct.new in all sets then we'd have this situation:

(type $Box (struct (field i32)))

(type $Boxer (struct (field (ref $Box))))

;; All sets look like this:
(struct.set $Boxer
  ..
)

;; All gets look like this:
(struct.get $Box 0
  (struct.get $Boxer 0
    ..
  )
)

;; => optimize that to this: =>

(type $Box (struct (field i32)))

(type $Boxer (struct (field i32)))

;; All sets look like this:
(struct.set $Boxer
  (struct.get $Box 0
    ..
  )
)

;; All gets look like this:
(struct.get $Box 0
  ..
)

What we do here is effectively "pull" the struct.get $Box from the reads of Boxer's fields into the writes. That allows us to store only an i32 instead of a reference, which may be useful in itself, but whether this is actually faster depends on how many reads we have vs. writes, which is something a compile-time optimizer can't see. If we have far more writes than reads then we'd be making the code slower.