Multi Core Processor Support - jonelo/jacksum GitHub Wiki

Jacksum takes advantage of multi-core processor and multi-CPU computers for parallel, simultaneously computation and verification of hash values. The following document should give you an insight on the various use cases and how Jacksum v3+ supports those (or not).

Note: if you do your own tests on multi-core or multi-CPU systems, I recommend to do those on bare metal, because virtualisized environments could have an negative impact on how multi-cores are used actually (for any process, and not just only Jacksum).

Multiple files, one algorithm

Use cases

  • An user would like to compute hash values for later file verifications/integrity checks
  • An user wants to check the integrity of his/her files

Support for multi-core systems

  • Yes

Multiple files, multiple algorithms

Use cases

  • same as above, but for power users or users who don't trust just only one algorithm

Support for multi-core systems

  • Yes

One file, multiple algorithms

Use cases

  • Typical use case for a developer who want to provide many hash values of one file to users

Support for multi-core systems

  • Yes

Concrete example

Let's assume that we want to calculate the hash values by using the algorithms SHA-1, SHA-224, SHA-256, SHA384, and SHA512 of the Ubuntu 20.04.3 iso image called ubuntu-20.04.3-desktop-amd64.iso which is about 3 GiB.

My test machine is an older MacPro (Mid 2010) with 2 x 3.06 GHz 6-Core Intel Xeon processors, running macOS Mojave and OpenJDK 11.0.10 from AdoptOpenJDK. Since the box also have hyperthreading support on, we have 24 processors available. By typing

jacksum --info

you see at the end of the output the number of available processors:

...
Available processors: 24

With Jacksum 1.7.0 (from 2006) it was already possible to to calculate many algorithms. Nonetheless that was a sequential, single threaded process. The file was read only once, though.

Johanns-Mac-Pro:~ johann$ java -showversion -jar /Users/johann/Downloads/jacksum-1.7.0/jacksum.jar -a sha1+sha224+sha256+sha384+sha512 -F "#ALGONAME{i}(#FILENAME)=#CHECKSUM{i}" -V summary /Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso
openjdk version "11.0.10" 2021-01-19
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.10+9, mixed mode)
sha1(/Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso)=61f0d29ea5c0a414025de67d177e2212987afe7d
sha224(/Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso)=15a1d10a91b3cec2182d94f1f94db65c5b9f480dd5489c7295707f2d
sha256(/Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso)=5fdebc435ded46ae99136ca875afc6f05bde217be7dd018e1841924f71db46b5
sha384(/Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso)=5ce98f3dbfe80f2123b27097ef575943147adea2b4d162240573a8fa9175ac399b00a0d85aacb65fe0724ee52bbf6668
sha512(/Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso)=69b7b79d12573254ee49558975567bfa8f1e8c3dacf2877266c13db462196b39baf3d3b8966c6b90b22115d85eb19ee38289929906a27c6248bee71d7b544f84


Jacksum: processed directories: 0
Jacksum: directory read errors: 0
Jacksum: processed files: 1
Jacksum: processed bytes: 3071934464
Jacksum: file read errors: 0
Jacksum: elapsed time: 0 d, 0 h, 1 m, 36 s, 296 ms
Johanns-Mac-Pro:~ johann$ 

The calculation took 1 m, 36 s, 296 ms.

Jacksum 3 can calculate the same hash values, but it does the job in parallel/simultaneously.

Johanns-Mac-Pro:~ johann$ java -showversion -jar /Applications/Jacksum/jacksum.jar -a sha1+sha224+sha256+sha384+sha512 -F "#ALGONAME{i}(#FILENAME)=#CHECKSUM{i}" -V summary /Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso 
openjdk version "11.0.10" 2021-01-19
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.10+9, mixed mode)
sha1(/Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso)=61f0d29ea5c0a414025de67d177e2212987afe7d
sha224(/Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso)=15a1d10a91b3cec2182d94f1f94db65c5b9f480dd5489c7295707f2d
sha256(/Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso)=5fdebc435ded46ae99136ca875afc6f05bde217be7dd018e1841924f71db46b5
sha384(/Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso)=5ce98f3dbfe80f2123b27097ef575943147adea2b4d162240573a8fa9175ac399b00a0d85aacb65fe0724ee52bbf6668
sha512(/Users/johann/Downloads/ubuntu-20.04.3-desktop-amd64.iso)=69b7b79d12573254ee49558975567bfa8f1e8c3dacf2877266c13db462196b39baf3d3b8966c6b90b22115d85eb19ee38289929906a27c6248bee71d7b544f84


Jacksum: files read successfully: 1
Jacksum: files read with errors: 0
Jacksum: total bytes read: 3071934464
Jacksum: total bytes read (human readable): 2 GiB, 881 MiB, 640 KiB, 0 bytes

Jacksum: elapsed time: 30 s, 330 ms

Since we distribute the load on many cores it finishes much quicker. On the same box, same file on the same SSD, with the same JVM, it just took 30 s, 330 ms, and you get the result 3.2x faster. In other words, we have saved more than one minute in this case!

Notes:

  • the slowest algorithm selected determines the minimum computation time.
  • to determine the slowest algorithm you can type jacksum -h -a sha1+sha224+sha256+sha384+sha512 --list --info which prints out details including a relative speed rank. In my case the SHA-224 was the slowest.
  • we still didn't use the entire power of the box, because although we have 24 processors available in theory, we just can use 5 threads in parallel, because we have selected just 5 algorithms for the computation. That changes if we select many more algorithms or if we use more than one file or if we select many algorithms and many files.

One file, one algorithm

Use cases

  • Typlical use case for almost anyone who want to verify the integrity of file that has been downloaded from the net.

Support for multi-core systems

  • No

For an algorithm to do its work internally in parallel, its design of the algorithm must allow it to do so. Many algorithms aren't designed that way and they simply cannot benefit from a multi-core system. However, there are algorithm designs that are parallelizable such as BLAKE2sp, BLAKE2bp, BLAKE3, tree hashing algorithms, and most of the proposals from the "NIST SHA-3 competition". However implementing a multi-threaded version of an algorithm usually helps that particular algorithm only, and not any other algorithm.

Jacksum supports multiple threads if the solution is valid for all algorithms - for instance if you have multiple algorithms selected or if want to hash values from many files. That are esay tasks to achieve parallelism.

For now Jacksum doesn't support multi-threaded versions of any of the algorithms that Jacksum supports.

Finding the algorithm by brute force

Use cases

  • Typical use case for a developer who knows both input and hash value (or CRC or checksum), but does not know the algorithm that was used to produce the value.

Support for multi-core systems

  • No

Concrete example

See also https://github.com/jonelo/jacksum/wiki/Cheat-Sheet#find-the-algorithm-to-a-hash-value

For future releases that may be a good candidate for parallelism if the algorithm width is > 16 bits. For a bit width of <= 16 bits it does not make sense, because usually in this case the brute force process finishes in just a few seconds on today's computers.