Blockchain: Hashing - blockchainpsu/blockchain-essentials-spring2020 GitHub Wiki

Introduction

To keep everything secret and secure, we need to send our data in a way that can't be easily read if a bad actor gets their hands on our data.

That's where hashing comes in: it converts our data in a one-way fashion so that other observers won't know what the original data was.

Let me emphasize its one-way nature: this is not encryption. When I encrypt something, it comes with the expectation that it will be decrypted. Hashing has no such aspirations—it just wants to make something unreadable.

Basics of Hashing

Hashing functions take in some sort of input, usually a string, and return a sequence of bytes or a hexadecimal string.

Consider the following code chunk:

const sjcl = require('sjcl')  // Stanford JavaScript Crypto Library

data = "hello world"
bytes = sjcl.hash.sha256.hash(data)
hash = sjcl.codec.hex.fromBits(bytes)
// bytes = [ -1186125895, 1823654392, -1523690793, -629298182, 
//           -997920797, 2052292846, -1870071892, -487600663 ]
// hash = b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9

It's a little messy, but my hash function here is the very popular SHA-256, which stands for Secure Hashing Algorithm, and returns a 256-bit value (64 characters, 32 bytes). I take in data and spit out a hexadecimal string, if I so choose, or just the byte array.

Relevant Hash Functions

SHA-256 is really important in blockchain, but we also use RIPEMD-160 to help with creating public addresses, which we'll cover in Transactions. As the name implies, the returned hash should be 160 bits long.

Properties of Hash Functions

To qualify as a cryptographic hash function, a function H(x) must have the following properties:

Input can be any length
Output has a fixed length
H(x) is relatively easy to compute given x
x is relatively difficult to compute given H(x)
H(x) is one-way
H(x) is collision-free (one-to-one)

The first property is fairly easy to understand. The data we want hashed can be any length. Simple. The second property is related in that the output is always the same length; for SHA-256, the length is 256 bits. SHA-512 is 512 bits, et cetera.

The third and fourth properties are also related. Calculating the hash of some input must be computationally simple to do; the result should be returned quickly. However, if you're given the hash of an input, it must be computationally difficult to find the input. Note that it's not impossible, just really, really hard.

The fifth and sixth properties stipulate that the hash function H(x) should be one-way; given an input you can find the output, but the hash function shouldn't be able to process an output to find the input. Finally, and very importantly, the hash function should be collision-free. No two inputs should have the same output hash. In linear algebra/computational math, we call this one-to-one.