Examples - aaronriekenberg/rust-parallel GitHub Wiki

  1. Find largest directories
  2. grep in large file
  3. Rename files in directory
  4. Organize files into subdirectories
  5. Computation on list of files
  6. Running diff on all files in 2 directories
  7. Processing CSV inputs using regular expression
  8. Compress files from find command

Find largest directories

Find subdirectories at max depth 1 and display from smallest to largest disk usage with du:

$ find .  -maxdepth 1 -type d | rust-parallel du -sh  | sort -h

This will parallel du -sh commands on all available cpus, you might be surprised how fast this is.

grep in large file

Suppose we have a large text file, we want to run grep to find the count of lines matching a regular expression.

This command finds all 3 digit words in a text file:

$ cat /usr/share/dict/words | egrep '^...$'

Above will run a single egrep process with for the entire file, this might be too slow. To run multiple egrep processes in parallel:

$ cat /usr/share/dict/words | rust-parallel --pipe egrep '^...$'

rust-parallel breaks the file into blocks and sends each block to parallel egrep processes. Blocks are split at newline boundaries by default. For large files this can have a big speed improvement.

The default block size for pipe mode is 1MiB, this can be overriden with --block-size option:

$ cat /usr/share/dict/words | rust-parallel --block-size 200KiB --pipe egrep '^...$'

Rename files in directory

Rename files in current directory from from *.txt to *.csv.

{} variable is entire *.txt file name, {1} capture group is prefix of file name before .txt:

$ rust-parallel -r '(.*)\.(.*)' mv {} {1}.csv ::: *.txt

Use --dry-run to just log commands that would be executed:

$ rust-parallel --dry-run -r '(.*)\.(.*)' mv {} {1}.csv ::: *.txt

Organize files into subdirectories

Suppose we have a directory of files beginning with YYYYMM. The following will create YYYY/MM subdirectories, then move YYYYMM* files into the subdirectories. Here {1} and {2} are automatic variables for all possible year and month combinations:

rust-parallel --shell 'mkdir -p {1}/{2} && mv -f {1}{2}* {1}/{2}' ::: 2023 2024 ::: 01 02 03 04 05 06 07 08 09 10 11 12

Equivalent command using seq to generate sequences of years and months:

rust-parallel --shell 'mkdir -p {1}/{2} && mv -f {1}{2}* {1}/{2}' ::: $(seq 2023 2024) ::: $(seq -w 12)

Computation on list of files

Suppose we have a bash function analyze_file and a list of *.txt files to analyze in current directory. This example uses --jobs 4 to control max parallel jobs, --shell to call a bash function, --progress-bar to display a graphical progress bar, and --timeout-seconds to kill each job if not finished after 5 minutes.

$ analyze_file() {
  echo "in analyze_file file = $1"
  # do some expensive analysis of file $1 parameter
}

$ export -f analyze_file

$ rust-parallel --jobs 4 --shell --progress-bar --timeout-seconds $((5*60)) analyze_file ::: *.txt

Running diff on all files in 2 directories

Suppose we have two directories dir1 and dir2 within the current directory.

This command finds all files within dir1 and diffs with the same path in dir2.

Since -s shell mode is used, the entire quoted string is run with bash as a single command. The -r regular expression extracts everything after the first / character into the {1} capture group:

$ find dir1 -type f | rust-parallel -s -r '^[^/]*\/(.*)$' 'echo diffing {1} ; diff --color=always dir1/{1} dir2/{1} ; echo diff {1} returned $? '

Processing CSV inputs using regular expression

Suppose we have an input CSV file of http method, URL, and identifier. Below could be used to make parallel calls with curl including a json body. Here {method}, {url}, and {id} are named regular expression capture groups, -j3 is maximum 3 parallel jobs, -t5 is a 5 second timeout:

$ cat >./test.csv <<EOL
GET,http://example.com/endpoint1,1234
PUT,http://example.com/endpoint2,2345
POST,http://example.com/endpoint3,3456
EOL

$ cat test.csv | rust-parallel --regex '(?P<method>.*),(?P<url>.*),(?P<id>.*)' -j3 -t5 curl -X {method} {url} -d '{"identifier":{id},"operation":"{method}"}'

Compress files from find command

Use find to find all files in current directory and subdirectories. The -0 option works nicely with find -print0 to handle filenames that may have whitespace characters. Call gzip -f -k on each file from find command:

$ find . -type f -print0 | rust-parallel -0 gzip -f -k
⚠️ **GitHub.com Fallback** ⚠️