Output Interleaving - aaronriekenberg/rust-parallel GitHub Wiki

No parallel execution command would be complete without a long discussion of output interleaving.

What do other commands do?

xargs -P (allows output interleaving)

From the xargs man page, when run with the -P option to enable parallel processes:

Please note that it is up to the called processes to
properly manage parallel access to shared resources.  For
example, if more than one of them tries to print to
stdout, the output will be produced in an indeterminate
order (and very likely mixed up) unless the processes
collaborate in some way to prevent this.  Using some kind
of locking scheme is one way to prevent such problems.  In
general, using a locking scheme will help ensure correct
output but reduce performance. 

Short summary: with xargs all child processes inherit stdout and stderr file descriptors from the parent process. This works perfectly if only one child process is running at a time (-P1). When multiple parallel child processes are writing to stdout or stderr there is no protection from interleaved output.

GNU parallel (does not interleave output but slow)

From the GNU parallel man page for the --group option:

Group output.

Output from each job is grouped together and is only printed when the command is finished. 
Stdout (standard output) first followed by stderr (standard error).

This takes in the order of 0.5ms CPU time per job and depends on the speed of your disk for larger output.

--group is the default.

Short summary: GNU parallel writes temporary command outputs to disk. This prevents interleaved output, but its slow.

What does rust-parallel do? (no interleaving and fast)

There are 2 options:

  1. By default 2 pipes are opened for each spawned child process to capture stdout and stderr and send it back to the parent process. There is a single task in the parent process using a tokio::sync::mpsc::channel to receive and do a non-blocking copy of each child's output to stdout and stderr (see output::task::OutputTask). All output from each child process is processed by this task, so there is no output interleaving. Temporary output is not written to disk.

  2. The -d, --discard-output option can be used to discard stdout, stderr, or both from spawned child processes. This reduces the number of pipes opened and data copied. This may make rust-parallel even faster if child processes produce unnecessary output.

In the benchmarks the default option #1 is used. rust-parallel is faster than xargs, and much faster than GNU parallel.