Fuzzing - WebAssembly/binaryen GitHub Wiki

Binaryen has built-in fuzzing and reducing capabilities. They can be used both on either Binaryen itself or other compilers, VM, or toolchains.

Fuzzing

Binaryen's wasm-opt tool has the --translate-to-fuzz / -ttf option. When set, it considers the input as a stream of arbitrary bytes that it converts into a valid wasm module - somehow. That is, the input is sort of like a random seed to a deterministic random number generator, and instead of numbers we generate wasm modules.

In other words, you can give wasm-opt -ttf any input file with any contents, and it will create a wasm file. You can then save it (using -o) and run that in another tool. For example, you can run a fuzzing script that generates a random string, feeds that to wasm-opt -ttf, and runs a VM on that output.

Some additional useful options:

--emit-js-wrapper: Emit a JavaScript file that loads the wasm module and runs it, printing out some results from calling its exports. This is helpful for testing a JavaScript/WebAssembly VM: just run the VM on that JavaScript file and pass it the wasm file as a parameter.
--emit-spec-wrapper: Similar, but emits s-expression commands that can be run in the WebAssembly spec interpreter.

For fuzzing of Binaryen itself, the following options are useful:

--fuzz-exec: This runs the generated wasm module in the Binaryen interpreter, printing out results from calling its methods, similar to the JS wrapper from --emit-js-wrapper. This will also do that another time after optimizations, which lets you check if they broke anything.
--fuzz-binary: In addition to the previous option, this will write to binary and read it back before running the second time. This helps find binary format bugs.

These two options are not strictly necessary, but can greatly improve execution times, as a single invocation can do a full random module generation + optimization + binary test. For example,

wasm-opt input.dat -ttf --fuzz-exec --fuzz-binary -O3

Even on a fairly low-powered machine this lets afl-fuzz do hundreds of iterations per second.

Generated Wasm File Properties

The output wasms from -ttf are guaranteed to not hang, as they have built-in hang instrumentation. They may trap though. The JS wrapper code will catch those and print them.

Reducing

A complementary feature is reducing: taking an existing interesting testcase and reducing it to as small a testcase as possible while keeping it interesting. Binaryen's wasm-reduce tool can do that, using something like

bin/wasm-reduce start.wasm "--command=checker-command test.wasm" -t test.wasm -w work.wasm

This takes an input wasm and a bunch of options:

The command is the command to be run. Our goal during reduction is to keep the behavior of the command the same, namely, same exit code and same stdout. You should therefore make sure the command emits only the relevant output for you (e.g., if the command prints out the wasm binary size, reduction is impossible!).
The "test file": the file we write to and then run the command. The command should run on that file. Note how in the example above we explicitly tell it to (but if it had that name hardcoded inside it, that would be fine as well).
The "work file": the current reduction. You can look at that file while reduction is still going on to track progress. This will also contain the final reduction at the end.

wasm-reduce works by trying all sorts of changes to the file that shrink it, and if a change is valid (keeps it "interesting", i.e., same result on the command) then we keep it and continue from there.

Reduction can be a slow process, because we need to check every change by running the command - so if the command takes 5 seconds, it may take that long to shrink by a single byte (!). wasm-reduce tries to get around that by taking advantage of the Binaryen optimizer: it will interleave "destructive reduction" (removing code, breaking code in ways that might alter program behavior) with "pass reduction" (running Binaryen optimizer passes, which should not alter program behavior). For example, destructive removal of a condition to an if might let an optimization pass remove one arm of the if, which can be much faster than removing all the parts of the arm one by one.

Note that wasm-reduce only works on valid wasm files. It takes advantage of the structure of the wasm in order to reduce effectively, which means we parse it and then manipulate it. If we can't even parse it, we can't do anything. In that case you may want to use more general purpose binary testcase reduction tools.

Helper scripts

fuzz_opt.py is a very useful script that runs

various fuzzing modes (--fuzz-exec, specific fuzzing for wasm2js and asyncify, etc.)
various random inputs (random bytes interpreted using -ttf)
various random passes

Just running

$ python scripts/fuzz_opt.py

will run the script, which will continue to run until it finds a possible bug.

For maximum throughput, it is recommended to run scripts/fuzz_opt.py including its related binaries on a fast hard drive or, alternatively, on a ramdrive. Running on spinning disks is about an order of magnitude slower.

This script will use existing wasm files as the basis for fuzzing (mutating and expanding upon them), which is good if that set of files represents realistic content. By default the script will use all testcases in the test suite as such initial content, with a priority given to files modified in the last 30 days. You can also put wasm files in the ./fuzz/ directory and it will likewise treat them as high priority initial content, which is useful when you have some local files you want to especially fuzz.

Setting up dependencies

third_party/setup.py can automatically install the necessary dependencies like the Spidermonkey JS shell (mozjs), the V8 JS shell (d8) and WABT in third_party/.

./third_party/setup.py [mozjs|v8|wabt|all]

Also helps when fuzzing on a ramdrive (requires about 300mb):

./third_party/setup.py all
cp -r build/ scripts/ test/ third_party/ /path/to/ramdrive
cd /path/to/ramdrive
./scripts/fuzz_opt.py --binaryen-bin build/bin