Project: Final Results and Performance Metrics - tnl3pdx/ece510-HwAIML GitHub Wiki
Metrics to Focus On
I will mainly put my focus on the signing in the SPHINCS+ program. The reason is that the signing stage is the longest out of the 3 stages. Creation of public/private keys and verifying that the signature is valid is much shorter in comparison to the signing stage, so this will be the main focus of the results breakdown. These are the metrics that need to be accounted for to determine the results of my implementation:
Software
- Total execution time of the hash function for a single signing (how long the hash function takes during execution)
- Average time for a hash call (on average, how long it takes for a call to finish)
- Number of hash calls per signing (total number of calls to the hash function)
- List of unique message lengths to the hash function (used to determine how many blocks need to be allocated for SHA-256)
- Distribution of call counts by unique message lengths (need to know how many times a 64 or 80 message block has been passed through)
Hardware (HDL)
- Max Clockable Frequency based on synthesis via OpenLane
- Execution times for each unique message length (different numbers of message blocks will affect the execution time of a single call)
- Communication cost between hardware and software layers
Results for Software
Based on my prior profiling of the project, I found that the hash function took around 14 seconds to compute the test.txt file located in Challenge 9's folder. On average, it takes 8.6 us for each iteration of the hash function, which was called 1734375 times.
SPINCS+ algorithm only uses 4 different message lengths: 64, 80, 208, and 608. A majority of the iterations are concentrated in the 64 and 80 message categories which take up 2 blocks in the SHA-256 computation.
Results for Hardware
Clock Speed of HDL
Using my results from my synthesis via OpenLane in Challenge 21, the max frequency of my design is 15 ns.
Execution times for Different Block Sizes
For individual execution times for each message length type, these were the simulated results using QuestaSim. Since this is hardware, only one iteration per message length is needed to confirm these findings. For the 64 and 80 character tests, the execution time for each block is 5.621 us, which is much faster than the software's average execution time per call.
Percentage Differences between Iterations
For a 1015 message test, these are the following execution times for each iteration of this project.
From Iteration #1/#2 to #3, this was a 36.75% reduction in execution time. From Iteration #3 to #4, this was a 43.09% reduction in execution time. From Iteration #1/#2 to #4, this was a 64.01% reduction in execution time.
Communication Time
For the communication time between HW and SW, this was modeled using an SPI bus, which sends 8 bits from the controller to the peripheral, and 32 bits from the peripheral to the controller. This was tested using cocotb, which can be found in Challenge 25. In total, the latency and throughput of this SPI setup were 0.705 us and 22695.04 kbps.
For a rough estimation of the total additional communication cost between SW and HW, the parts of the SHA256 HDL call that will be affected are the transmission of the entire message and the latency to start a transaction.
For throughput, the total time it will add is calculated by computing the number of bytes processed per signing and dividing this by the throughput of the SPI bus. This results in a total of 41.37 seconds, which is nearly 3 times the execution time of the hash function on SW.
As for latency, assuming that it takes 1 call to start a transaction and 1 call to receive the output hash, this means that per hash call, it takes 165.51 seconds for a single signing.
The total communication time per signing is 206.89 seconds.
Final Results
Using the data provided in the previous 2 sections, the speedup between the software and hardware implementations of SHA-256 without considering communication time is:
As for with communication time, the speedup is: