Examples - aaronriekenberg/rust-parallel GitHub Wiki
- Find largest directories
- Rename files in directory
- Organize files into subdirectories
- Computation on list of files
- Running diff on all files in 2 directories
- Processing CSV inputs using regular expression
- Compress files from find command
Find subdirectories at max depth 1 and display from smallest to largest disk usage with du
:
$ find . -maxdepth 1 -type d | rust-parallel du -sh | sort -h
Rename files in current directory from from *.txt
to *.csv
.
{0}
capture group is entire *.txt file name, {1}
capture group is prefix of file name before .txt:
$ rust-parallel -r '(.*)\.(.*)' mv {0} {1}.csv ::: *.txt
Use --dry-run
to just log commands that would be executed:
$ rust-parallel --dry-run -r '(.*)\.(.*)' mv {0} {1}.csv ::: *.txt
Suppose we have a directory of files beginning with YYYYMM
. The following will create YYYY/MM
subdirectories, then move YYYYMM*
files into the subdirectories. Here {1}
and {2}
are automatic variables for all possible year and month combinations:
rust-parallel --shell 'mkdir -p {1}/{2} && mv -f {1}{2}* {1}/{2}' ::: 2023 2024 ::: 01 02 03 04 05 06 07 08 09 10 11 12
Equivalent command using seq
to generate sequences of years and months:
rust-parallel --shell 'mkdir -p {1}/{2} && mv -f {1}{2}* {1}/{2}' ::: $(seq 2023 2024) ::: $(seq -w 12)
Suppose we have a bash function analyze_file
and a list of *.txt
files to analyze in current directory. This example uses --jobs 4
to control max parallel jobs, --shell
to call a bash function, --progress-bar
to display a graphical progress bar, and --timeout-seconds
to kill each job if not finished after 5 minutes.
$ analyze_file() {
echo "in analyze_file file = $1"
# do some expensive analysis of file $1 parameter
}
$ export -f analyze_file
$ rust-parallel --jobs 4 --shell --progress-bar --timeout-seconds $((5*60)) analyze_file ::: *.txt
Suppose we have two directories dir1
and dir2
within the current directory.
This command finds all files within dir1
and diffs with the same path in dir2
.
Since -s
shell mode is used, the entire quoted string is run with bash
as a single command. The -r
regular expression extracts everything after the first /
character into the {1}
capture group:
$ find dir1 -type f | rust-parallel -s -r '^[^/]*\/(.*)$' 'echo diffing {1} ; diff --color=always dir1/{1} dir2/{1} ; echo diff {1} returned $? '
Suppose we have an input CSV file of http method, URL, and identifier. Below could be used to make parallel calls with curl
including a json body. Here {method}
, {url}
, and {id}
are named regular expression capture groups, -j3
is maximum 3 parallel jobs, -t5
is a 5 second timeout:
$ cat >./test.csv <<EOL
GET,http://example.com/endpoint1,1234
PUT,http://example.com/endpoint2,2345
POST,http://example.com/endpoint3,3456
EOL
$ cat test.csv | rust-parallel --regex '(?P<method>.*),(?P<url>.*),(?P<id>.*)' -j3 -t5 curl -X {method} {url} -d '{"identifier":{id},"operation":"{method}"}'
Use find
to find all files in current directory and subdirectories. The -0
option works nicely with find -print0
to handle filenames that may have whitespace characters. Call gzip -f -k
on each file from find command:
$ find . -type f -print0 | rust-parallel -0 gzip -f -k