bash parallel - ghdrako/doc_snipets GitHub Wiki

  • '&&' - and
  • '||' - or
command1 && command2 #  command2 is not executed when command1 has returned an error
command1 ; command2  # both command execute no matter exit code
command1 ; command2 & command3 # second comand execute in backgroud 
{ command1 ; command2 ; } & command3 

using &

for stuff in things
do
( something
  with
  stuff ) &
done
wait # for all the something with stuff
#!/bin/bash

echo "Started at `date +%Y-%m-%d\ %H:%M:%S`"
{
    echo "Starting job 1"
    sleep 5
    echo "Finished job 1"
} &
{
    echo "Starting job 2"
    sleep 5
    echo "Finished job 2"
} &
wait
echo "Finished at `date +%Y-%m-%d\ %H:%M:%S`"

xargs

xargs -P <n> allows you to run <n> commands in parallel.

time xargs -P 3 -I {} sh -c 'eval "$1"' - {} <<'EOF'
sleep 1; echo 1
sleep 2; echo 2
sleep 3; echo 3
echo 4
EOF
# to process a large file by spliting it into smaller parts using xargs
cat bigfile.txt | xargs -n 10000 -P 10 process_lines.sh
LIST_OF_FILES=$(find . -name "*.txt")
echo "$LIST_OF_FILES" | xargs -P 8 -I{} gzip -9 {}

parallel

Install

sudo apt-get install parallel

Syntax:

parallel ::: prog1 prog2

Example:

# to execute multiple commands in parallel using parallel
parallel ::: 'command1' 'command2' 'command3'
$ find . -type f -name '*.doc' | parallel gzip --best
$ cat list.txt | parallel -j 4 wget -q {}
# OR
$ parallel -j 4 wget -q {} < list.txt
$(echo prog1; echo prog2) | parallel
# Or
$parallel ::: prog1 prog2

Process big JSON file

  • Parallel can split up files in an efficient manner using the --pipe-part option

keep our output in the original order, so we add the --keep-order argument. The default configuration, --group, would buffer input for each job until it is finished. Depending on your exact query, this will require buffering to disk if the query output can’t fit in main memory. This is probably not the case, so using --group would be fine. However, we can do slightly better with --line-buffer, which, in combination with --keep-order, starts output for the first job immediately, and buffers output for other jobs. This should require slightly less disk space or memory, at the cost of some CPU time. Both will be fine for “normal” queries, but do some benchmarking if your query generates large amounts of output.

parallel -a '<file>' --pipepart --keep-order --line-buffer --block 100M --recend '}\n' "jq '<query>'"

sem - parallel with semaphores

⚠️ **GitHub.com Fallback** ⚠️