Pruning unneeded code in wasm files with wasm metadce - WebAssembly/binaryen GitHub Wiki

The wasm-metadce tool can be used to remove parts of wasm files in a flexible way that depends on how the module is used. You can describe entities outside of the module and how they refer to parts of the module, and then wasm-metadce will perform dead code elimination on the entire graph of both the wasm and the outside. By considering the entire graph, it can even clean up cycles that span both the wasm and the outside.

Example: Pruning exports

To define the outside, you write a JSON-like file such as this:

[ 
  { 
    "name": "outside", 
    "reaches": ["export-foo"], 
    "root": true 
  }, 
  { 
    "name": "export-foo", 
    "export": "foo" 
  }, 
]

This first declares something with the name outside. That thing is a root, which means wasm-metadce will never remove it. And it refers to export-foo, which is then declared as the export of foo from the wasm module. You can then run

wasm-metadce input.wasm --graph-file graph.json -o output.wasm

That will load the input.wasm, run DCE, and then write the output. What will happen during the DCE is that there is a single root, outside, which refers to the export foo, and as a result we will keep that wasm export alive and anything it can refer to. All other exports and all other code not referred to will be removed from the wasm file. For example, if you started with this file:

(module
  (func $foo (export "foo")
    (call $bar)
  )

  (func $bar)

  (func $quux (export "quux")
    (call $later)
  )

  (func $later)
)

Then wasm-metadce will emit this:

(module
  (func $foo (export "foo")
    (call $bar)
  )

  (func $bar)
)

Only the export we wanted remains, and only things it refers to.

Cycles

As mentioned before, wasm-metadce can even remove cycles between the wasm and the outside, and long chains of dependencies that enter and exit the wasm, etc. etc.

Emscripten, for example, uses that to shrink its combined JavaScript + WebAssembly output: It represents the entire JS program outside of the wasm in the graph file, then runs that with the wasm through wasm-metadce. Emscripten will then do DCE on the JS as well, using the information wasm-metadce provided: it will log out the names of things it inferred are unused, and Emscripten then cleans them up from the JS. In the example above, this will be logged:

unused: export$quux$4
unused: func$later$3
unused: func$quux$2

That is the one export and two functions that we could remove. As they are inside the wasm, wasm-metadce already removed them for us, but if the list of now-unused things contains things on the outside then in Emscripten's case those would be JS things that it can optimize away.

Similarly, wasm-metadce can be used to DCE across a pair of wasm modules, or a set: Just create a proper graph JSON file and run wasm-metadce. That will DCE on the first module and inform you of what can be removed from the others, for which you can create a new graph file and run wasm-metadce again to DCE them as well. The key idea is that the first invocation of wasm-metadce is given the entire graph of everything and so it can find all the things that can be removed.