Examples - aaronriekenberg/rust-parallel GitHub Wiki

Find largest directories
Rename files in directory
Organize files into subdirectories
Computation on list of files
Running diff on all files in 2 directories
Processing CSV inputs using regular expression
Compress files from find command

Find largest directories

Find subdirectories at max depth 1 and display from smallest to largest disk usage with du:

$ find .  -maxdepth 1 -type d | rust-parallel du -sh  | sort -h

Rename files in directory

Rename files in current directory from from *.txt to *.csv.

{0} capture group is entire *.txt file name, {1} capture group is prefix of file name before .txt:

$ rust-parallel -r '(.*)\.(.*)' mv {0} {1}.csv ::: *.txt

Use --dry-run to just log commands that would be executed:

$ rust-parallel --dry-run -r '(.*)\.(.*)' mv {0} {1}.csv ::: *.txt

Organize files into subdirectories

Suppose we have a directory of files beginning with YYYYMM. The following will create YYYY/MM subdirectories, then move YYYYMM* files into the subdirectories. Here {1} and {2} are automatic variables for all possible year and month combinations:

rust-parallel --shell 'mkdir -p {1}/{2} && mv -f {1}{2}* {1}/{2}' ::: 2023 2024 ::: 01 02 03 04 05 06 07 08 09 10 11 12

Equivalent command using seq to generate sequences of years and months:

rust-parallel --shell 'mkdir -p {1}/{2} && mv -f {1}{2}* {1}/{2}' ::: $(seq 2023 2024) ::: $(seq -w 12)

Computation on list of files

Suppose we have a bash function analyze_file and a list of *.txt files to analyze in current directory. This example uses --jobs 4 to control max parallel jobs, --shell to call a bash function, --progress-bar to display a graphical progress bar, and --timeout-seconds to kill each job if not finished after 5 minutes.

$ analyze_file() {
  echo "in analyze_file file = $1"
  # do some expensive analysis of file $1 parameter
}

$ export -f analyze_file

$ rust-parallel --jobs 4 --shell --progress-bar --timeout-seconds $((5*60)) analyze_file ::: *.txt

Running diff on all files in 2 directories

Suppose we have two directories dir1 and dir2 within the current directory.

This command finds all files within dir1 and diffs with the same path in dir2.

Since -s shell mode is used, the entire quoted string is run with bash as a single command. The -r regular expression extracts everything after the first / character into the {1} capture group:

$ find dir1 -type f | rust-parallel -s -r '^[^/]*\/(.*)$' 'echo diffing {1} ; diff --color=always dir1/{1} dir2/{1} ; echo diff {1} returned $? '

Processing CSV inputs using regular expression

Suppose we have an input CSV file of http method, URL, and identifier. Below could be used to make parallel calls with curl including a json body. Here {method}, {url}, and {id} are named regular expression capture groups, -j3 is maximum 3 parallel jobs, -t5 is a 5 second timeout:

$ cat >./test.csv <<EOL
GET,http://example.com/endpoint1,1234
PUT,http://example.com/endpoint2,2345
POST,http://example.com/endpoint3,3456
EOL

$ cat test.csv | rust-parallel --regex '(?P<method>.*),(?P<url>.*),(?P<id>.*)' -j3 -t5 curl -X {method} {url} -d '{"identifier":{id},"operation":"{method}"}'

Compress files from find command

Use find to find all files in current directory and subdirectories. The -0 option works nicely with find -print0 to handle filenames that may have whitespace characters. Call gzip -f -k on each file from find command:

$ find . -type f -print0 | rust-parallel -0 gzip -f -k