Cryptography - ashishranjandev/developer-wiki GitHub Wiki
Cryptographic primitives
Well-established, low-level, cryptographic algorithms that are frequently used to build cryptographic protocols for computer security systems.
These routines include but not limited to one-way hash functions and encryption functions.
- Symmetric and Asymetric Ciphers
- Random Numbers
- Stream and Block Symmetric Ciphers
- Public Key Infrastructure
- Hash Functions
Terms to remember
- Plain text - Unencrypted data.
- Cipher - Set of steps we use to encrypt our message.
- Cipher text - Encrypted data.
Substitution Cipher - We replace one letter with another letter.
Types of Encryption
- Symmetric
- Asymetric
- Hashing Function
Encryption - converting info into a code to prevent unauthrorized access.
Symmetric has 1 key.
Asymetric has 2 keys. 1 public and 1 private.
This helps to identify the party sending data to us
We can also ensure that the receiver we wants will receive the data is the only one who receive it
It is slower
Hashing function uses combination of boolean algebra, bit shifting, modular action and compression to return a fixed length string.
MD5 returns 128 bit hash.
We cannot decrypt a hash value. It is one way function.
Cryptography depends on random data to be secure.
The quality of the random numbers a random number generator can generate depends on the quality of the entropy source available to it, or, in other words, the source of the natural randomness that the random number generator uses to harvest the random numbers that it returns to us.
Now, we're not going to be able to tap into sophisticated sources of entropy, such as photon transmission and reflection through a semitransparent mirror, but we can get good sources of entropy from the hardware that we're running on. Mouse movements, keyboard input, disk and network I/O are all good sources of entropy that we can use to harvest random data. But, it takes time to harvest good, random data. If we tried to harvest random data from this entropy in a short timeframe, patterns can appear, which doesn't give us good quality random data. So, it's not really practical for us to get large amounts of random numbers on demand. There are times, though, when we're going to need a certain quantity of random numbers right away when we ask for them. Now what we'd do in this case is use a pseudorandom number generator. It takes some high‑quality random data that we pass to it, which is called the seed. From the seed, it will generate as many random numbers as we ask for on demand. The catch is that these aren't truly random numbers. In fact, if we pass the same seed to the pseudorandom number generator, we're going to get back the same random numbers each time. But this can actually work to our advantage. When we use stream ciphers, we depend on that behavior. It also means that you don't want to use the encryption key generated by a pseudorandom number generator more than once, and you'll see how this works when we get into stream ciphers.
Logical Operators
Logical operators describe the result of combining two Boolean values, and most of these are really intuitive to us because they're part of our everyday life.
It's the one that we're going to use later on that's not so intuitive.
The two that we know best are "and" and "or".
If we're going through security in an airport, we need a boarding pass and ID to proceed.
If we have both a boarding pass and ID, we can go to the gates.
Without one or both, we can't go to the gates.
We can use either a driver's license or a passport as our ID.
If we have both, great.
But if we don't have either one, then we're stuck.
The nonintuitive operator that we need is exclusive or, usually said as XOR.
Maybe I shouldn't say that it's not intuitive.
It's just not something that we experience on a day‑to‑day basis.
So to picture this, let's say that we have two magnets and we want to stick them together.
If we try to force two north ends or two south ends together, the magnets are going to repel each other.
For the two magnets to stick together, we have to have one north pole and one south pole.
Now, of course, computers are binary, so we're going to be working with 1s and 0s rather than yes/no, true/false or north/south.
Symmetric Key Ciphers
Symmetric cryptography uses a single key to both encrypt and decrypt the message as compared to the public/private key pair that asymmetric cryptography uses.
So, symmetric ciphers are able to move data more quickly through them than asymmetric ciphers can.
Now, having the same key to both encrypt and decrypt the data causes us problems.
We have to control access to the key, which can be difficult because we need to figure out where we're going to keep it, knowing who's allowed to have it, making sure anyone asking for it is who they say they are, and safely and securely getting it to them.
But also, once they have it, we can't take it back.
The key is no longer secure.
There are solutions for these problems, which primarily involve key stores and short lifetimes for the symmetric keys, and I'll be talking about some of these solutions as we go through the course.
There are two styles of symmetric ciphers, which are block ciphers and stream ciphers.
The simplest comparison of these two styles will be that block ciphers will read and encrypt the message a fixed number of bytes at a time until the entire message has been processed, whereas a stream cipher will process the message one bit at a time as it flows by
Stream Ciphers
We're going to cover just enough material about ciphers to help someone who is not familiar with them to understand what they are and the role that they play when we reference them later on.
So, when we think about a stream, we think about something that's flowing, right? Well, in this case, we're talking about a flow of data.
Most programming languages implement to input/output data streams in one way or another, like reading data from a file into memory, for example.
This is also the concept behind the stream cipher.
A stream cipher is designed to encrypt data very quickly as it flows by, and the way that it does this is by XORing the data with the key, either one bit at a time or one byte at a time, just depending on the algorithm.
It's very fast, and assuming that you don't know anything about the data or the key, it's also pretty secure.
Encrypting data this way does have a couple of implications, though.
For each data bit, we have to have a key bit, which means that the key has to be the same size as the data, and this is where the pseudorandom number generator that we talked about earlier comes into play.
We generate some high‑quality random numbers and then use them as the seed for the pseudorandom number generator.
We can generate a key that's the same size as the data that we're encrypting on demand.
Also, the fact that a seed will generate the same set of numbers each time means that we don't have to transmit the entire key.
Instead, we just need to transmit the few random numbers that we're using as the seed to the other party, and then the other party can generate the key for themselves.
Now, for the best security, we shouldn't use a key more than once.
The more a key is used, the more information a potential adversary has to break the encryption on the data that we use the key for.
There's a lot that we could talk about when it comes to stream ciphers, but this is enough for our purposes.
Block Ciphers
We just looked at how stream ciphers encrypt a message one byte at a time.
Block ciphers encrypt data a fixed number of bytes at a time.
Now, the exact number of bytes a block cipher encrypts at a time is defined by the cipher, but it will usually be either 64 or 128 bits at a time, which would be 8 or 16 bytes at a time.
Now, key sizes will vary for each algorithm, and most algorithms will actually take multiple key sizes.
The algorithms that do provide multiple key sizes do so to allow us to decide how we want to balance our need for security with the amount of time that it takes to process the data.
Every symmetric block cipher's method of encryption differs, of course, but let's take a look at a generic block cipher implementation.
Our block cipher will use 128‑bit block size and a 128‑bit key size.
The 128‑bit block size means that we'll use a 4 x 4 byte array to process the block of data.
We'll be encrypting Hi Duck Airlines, which rather conveniently fits into 128‑bit block size.
Of course, if we had a plaintext message that was bigger than 128 bits, we'd break up the plaintext into multiple blocks to process, but if we didn't have 128 bits worth of data in our block, then we'd simply pad the plaintext with some more bits to make it a 128‑bit block.
The numbers here are the decimal representation of the ASCII values for our plaintext, which we'll put into our 4 x 4 matrix.
These numbers represent our symmetric key, which we will XOR with plaintext bits.
Most block ciphers have substitution tables called S‑boxes.
The construction of a substitution table is complex because it has to fulfill a number of cryptographic criteria, but using one is pretty simple.
So, let's use 77 as our example here.
The binary representation of 77 are these 8 bits.
Split it into two 4‑bit nibbles, and we will use the left 4 bits for the row of the table, and we will use the right 4 bits for the column.
Take that value and back in our block, substitute 38 for 77.
Doing the substitution for each value gives us this.
And the last step is to shuffle around the rows in the columns.
Then we repeat the XOR substitution and shuffling of rows and columns another nine times for this block of data.
This is where I'm going to wrap up symmetric cryptography.
And next, we're going to jump into asymmetric cryptography.
Asymmetric Cryptography
Okay, so we're switching gears now and we're going to talk about asymmetric cryptography.
So, asymmetric cryptography uses two keys, one called the public key and the other called the private key.
If you want to learn more about this, then I do have another course that takes a deep dive into the subject matter, but for this course that we're just going to focus on what we use asymmetric cryptography for.
The idea behind a pair of keys is that you can send one of your keys out into public, now it's freely available to anyone who needs it.
The other key, though, is private and only you have access to it.
Now, if your private key gets loose and winds up in somebody else's hands, then the key pair is useless.
And it's not because the keys stop working.
They'll still function, but it's rather that they can't be trusted.
Trust is a fundamental tenant of asymmetric cryptography because identity is an innate characteristic of a key pair.
The two keys in the key pair are sort of inverses of one another.
They're linked in such a way that an operation done with one of the keys can only be handled by the other key.
When we encrypt data, we do so with a specific person's public key, which means that the only person who can decrypt the data is the person with the corresponding private key.
Similarly, a person or a company or some other entity can provide proof that they are the owner or the producer of the data by signing that data with their private key.
Anyone who has the data can then verify the legitimacy of the data by validating the signature with the corresponding public key.
There are a few algorithms within the world of asymmetric cryptography.
The two main ones are RSA and ECC.
The others specialize in data signatures on key exchange.
The nature of asymmetric keys severely limits them in the amount of data that they can encrypt or decrypt at any one time.
The typical key size used today for asymmetric keys is 2048 bits, which translates into 256 bytes.
With the way the RSA encryption algorithms work, we can only encrypt 256 bytes at a time.
We also lose 42 bytes to overhead, so we're actually only able to encrypt 214 bytes at a time.
So, symmetric cryptography is much faster, and depending on what you read, it's anywhere between 100 and 1000 times faster.
But the two are not mutually exclusive.
In fact, it's a wide practice to use asymmetric keys to connect to the message recipient and pass a one‑time symmetric key for encrypting and decrypting the message.
And, at an extremely high level, this is how SSL works.
The browser and the server will initially connect the asymmetric keys.
The browser validates the identity of the server by validating the server certificate, which we'll talk about in a moment, and then the two mutually agree on a symmetric key to pass further data back and forth.
Getting into SSL takes us into the territory of digital certificates, and digital certificates require some infrastructure in order to work properly.
In fact, it's called the public key infrastructure, and it's what we're going to cover next
Public Key Infrastructure
I suspect that you're familiar with digital certificates, and you've probably encountered them most often in your web browser when making a secure connection over SSL.
Now, depending on what browser you're using, there will be a place where you can click and it will show you the information about the certificate.
I talked about how trust is a requirement for the asymmetric key ecosystem to work properly.
That requirement is in the spotlight when we get into certificates.
A certificate binds the party's public key to the data about that party.
And, a party here can be a person or a company or some other entity with an identity.
The question I have here is how do we know that the information I'm seeing is legitimate? How do I know that this just isn't someone who generated a pair of keys and created a certificate with false information? This is where the public key infrastructure comes into play, and the way that this works is that a party will generate a key pair, then it will generate a certificate signing request.
The party fills out their identity information and then attaches their public key to the request.
They send the request to a certificate authority, who then verifies the information in the signing request.
Now, depending on the level of trust that they're requesting, the verification might be something as simple as showing ownership of a domain name and providing registration information.
When the certificate authority is satisfied that the information is legitimate, they'll issue a certificate with the information from the request in the new certificate, and then the CA will sign the certificate with their own private key.
This creates a certificate chain that the client can use to validate that the info that they're looking at is legitimate.
And again, this is a very superficial look at the overall subject of public key infrastructure, so if you want to get into depth about it, you can check out my course, Securing Data with Asymmetric Cryptography, and I go into a lot of detail there.
So that's it for asymmetric cryptography.
And let's move on to hashing
Hash Functions
What is a hash function? Well, a hash function takes data that can be any length and returns a fixed size byte array, which is the hash value for the data called a message.
Now this is a perfectly accurate definition of a hash function, but it's overlooking some pretty important subtle properties and innate characteristics of a hash function.
A hash value for a given message is unique, meaning that there will never be two different messages that will produce the same hash value.
This isn't completely true, but the likelihood of a collision is, well, not likely.
A hash value is deterministic, and that means that a piece of data will always produce the same hash value, regardless of how many times it's run through the hash function, so long as it's the same hash function.
You know, a hash value from an MD5 will be different than a hash value from an SHA‑512, for example.
And, a hash value is a one‑way function, and what that means is that it's ridiculously expensive to compute the original data from the hash value.
So, having something that can give us a unique value, and give us that unique value over and over again, is incredibly useful.
It makes it easy for us to do things like detect changes to data or find duplicate data quickly and efficiently.
We store hash values of passwords in systems rather than the passwords themselves.
It's a more secure method of storing credentials on a system because it's ridiculously expensive to find the message from the hash value.
So, when does a hash function become a cryptographic hash function? Well, a cryptographic hash function goes to greater lengths to generate hash values that are absolutely unique and make it as difficult as possible to figure out what the message was that created the hash value in the first place.
So, why does unique matter? Why are we willing to expend extra resources on a hash algorithm to ensure that two different messages can't result in the same hash value? Well, let's think about a system where we store the passwords as hash values, and then some reprehensible villain breaks into our system and makes a copy of our password file.
It doesn't do them much good since we're storing the hash value rather than the password itself.
But, if they can find another message that produces the same hash value as the original password does, then they don't need the original password to legitimately get into the system, and this is called a collision attack.
But you know what? What if we just started collecting those messages and hash values that they produced? If we can build a large enough list, all we have to do is look up the hash value and get the message that produced the hash value in the first place.
Well, if we did that, we would have what's called a rainbow table, and the thought of an attacker having a list that they can use to get the original message by simply looking up the hash value is pretty concerning.
It's also easily defeated.
Adding a few extra bytes to the message will produce a completely different hash value than our unaltered message will.
This is called salting your data.
You need to keep that salt value somewhere secure so that you can use it again for validation purposes, but considering the alternative, it's not that big of a price to pay.
Summary
We've covered quite a bit of material here over the last almost half hour, and it's important information because the rest of what we're going to talk about builds on these cryptographic primitives.
But even with everything we covered, we really just scratched surface, as they say.
Pluralsight has many more courses that go into much more detail about any of the different subjects we've talked about here.
And, remember that you can also come back and watch any part of this module again if you need to when we get into using these primitives later on in the course.
Data Protection
Introduction
If you haven't watched the first module yet, you might want to go and do that now because this module, in particular, builds on the first module.
So, if you're not really familiar with cryptography, you're probably going to get pretty lost here, actually.
But, if you're comfortable with cryptographic primitives, then hey, let's go.
In this module, we're going to talk about how we protect data using cryptography.
Mostly what we're going to be concentrating on here is how we keep data from being read by people that we don't want to read it and what we can do to make sure that the data hasn't been tampered with since it was last written and encrypted.
There are three categories of data when it comes to data protection here.
One is data at rest.
Data at rest means that data has been written to some sort of storage device.
The other category that we're going to be looking at in this module is data in transit, which is data moving over a network.
The third category of data is called data in use, and we're going to be looking at some really amazing cryptography that we can do with data in use later on in the course.
Aside from preventing the data from being read, we also want to ensure that the data is not tampered with.
The way we go about protecting the integrity of the data is the same regardless of the category, and we'll look at how we go about doing that after we cover the methods of encrypting the data.
Data at Rest
Data at rest is data that's living on some sort of storage device.
Now, we don't care what the format of the data is.
It might be sitting in a database or it might be in a file or some other format altogether.
All we really care about for the purposes of our definition is that the data is written on some sort of storage device.
The question now is what's the best approach to encrypt this data? Well, let's go back and take a look at our cryptographic primitives.
We want it to be fast.
Well, I mean, we always want it to be fast, but more specifically, in this case, we don't care about identity.
We also want to be able to decrypt it.
So, the best solution here is symmetric cryptography, and we'll want to use a block cipher rather than a stream cipher since the data is static.
When we were talking about symmetric block ciphers earlier on, one of the things we talked about was how most symmetric block algorithms have multiple key sizes that allow us to balance the time that it takes to do cryptographic operations against the strength of the encryption, and we hear this a lot.
Balance between time and security.
What are we really dealing with? I mean, how much more time is it going to cost us for stronger encryption? Well, there are a lot of factors that affect the answer to this question, but I wanted some real numbers to look at, so I did a little experiment.
Now, this is far from any sort of benchmark or serious test that someone could use to gauge real‑world processing times, but we do get some numbers to look at.
So, the preferred algorithm right now is Rijndael.
It's the algorithm that was selected for the Advanced Encryption Standard here in the United States, and I used AES‑128 and AES‑256 for this experiment.
I created five files, a 1, a 10, and a 100 MB file, as well as a 1 and 2 GB file, and then I filled each of them with random ASCII characters.
Looking at the results, there's very little difference in the amount of time that it takes to process until we get up to the 2 GB range.
And even then, I mean, we're not really talking about any significant time penalty.
But, let's say that we're going to be processing tens of thousands of files upwards of 2 GB each where the time difference is going to start becoming a factor.
How do we decide which way to go here? Well, we decide where the balance is based on risk.
What would it cost us if the data was exposed? And that's really the big factor here, although another factor to consider is how long is the data going to be of value? In other words, would the data be of any value by the time it would realistically take an attacker to expose it if we encrypted it with a smaller key size? There's certainly other factors that affect you specifically that don't affect anyone else, so it all really comes down to your particular judgment.
Tokenization
Tokenization isn't cryptography per se, but it is a way of protecting data by replacing the data item in the source with a token.
Think key value pair here.
The token and the data are written to a database, which is called a vault.
So, if we want to protect our super secret data with tokenization, we generate a new token, insert the token and the original value into our vault, and then, of course, replace the super secret data in the plaintext with the token.
Now, there is something called vaultless tokenization.
It creates a token using a method similar to the way that a symmetric cryptography block cipher uses an S‑box.
Instead of creating the token and then stirring the token and the value as the key value pair, one or more bytes are used as the lookup value to find the other substitution bytes.
Now, vaultless tokens have some nice advantages.
Vault‑based tokenization grows linearly, so as time goes on, the cost of the vault goes up in terms of storage space needs and execution times of queries.
By comparison, vaultless tokens will maintain a consistent size and execution time because the tokens are calculated rather than stored.
There are significant risks using vaultless tokens as well.
Because vaultless tokens are calculated, introducing any randomness into the calculation is a problem because that randomness interferes with the repeatability of calculating the token.
And at this point, you're basically doing a weird form of encryption rather than tokenization.
Also, depending on how it's implemented, there is a risk that a substitution table could be recreated by someone who knew enough of the plaintext data and the corresponding token values.
If the substitution table is compromised, then all of the tokens are compromised rather than maybe one or two tokens in a traditional vault system.
But, either way you go, tokenization is fast, and it's a good option to hide data.
Data in Transit
Data in transit is data that's moving from one place to another, usually over a network connection.
When we're sending sensitive data to someone, we probably want to know that the person receiving the data is the same person we want to get it.
So, if we go back to our definitions here, asymmetric ciphers can be used to prove identity, but it's also slow compared to symmetric encryption.
It's actually slow to the point where it's impractical to use it on anything except for small amounts of data, which means there's not something we can use to encrypt data going over a network.
We really need to use symmetric encryption for transmitting data, but getting a symmetric key to the other person can't be done in a secure way.
Symmetric and asymmetric each solve half of our needs for transmitting data, and really, the strength of one compensates for the weakness of the other, and they're related in such a way that it lets us use them in combination to solve the overall problem.
So, this is how it works conceptually.
A calls B and asks for B's digital certificate, which contains B's public key and information about who B is.
A verifies that B is the person A wants to exchange data with.
A will send A's public key to B, and A may also send a digital certificate with A's public key if the situation is such that two‑way identification is required.
A symmetric key is created and both parties have a copy.
A might create a key and send it to B, or B might create the key and send it to A.
In the real world, there are many ways that a symmetric key is agreed upon.
What's important here is that both A and B have a secure copy of the symmetric key.
So, from this point, symmetric encryption is used to transmit the data back and forth.
And again, this is the concept of how it works.
In real‑world implementations, there are different ways that symmetric key exchange is handled.
Something interesting here is that today, block ciphers are used to transmit data in flight rather than stream ciphers, and I think there are a couple of reasons for this.
The first is the concern about the ability of a stream cipher to stand up to cryptanalysis.
In other words, how long it will take to break the cipher.
RC4 was a popular stream cipher for many years because of its speed and simplicity, and it was used as a stream cipher in SSL up until 2015, which is probably 15 years too long since it was broken 20 years ago.
It was developed in 1987 when the internet didn't exist and email was something that you really found only in big organizations and universities.
Decompiling and analyzing software wasn't practical, or maybe it wasn't even possible, so security by obscurity, or security that came from not knowing how the software worked, was viable.
Now, in 1994, when distributing to a mass audience was possible, RC4 was leaked and weaknesses were found and published.
Today, there are two good stream ciphers called Salsa20 and ChaCha20, which can be used in place of RC4.
Although, interestingly enough, AES is widely used as a symmetric cipher for SSL.
Even though it's a block cipher, it's a very fast block cipher, so it's an attractive option.
Well, that's it for data protection, and the last subject to talk about is data integrity, and we'll do that now
Data Integrity
So far, we've covered how to protect our data from being known by other people, either by encrypting it or replacing key pieces of the data with the token.
There's something else that we need to consider, though.
What if somebody gets a hold of an encryption key and changes our data? How do we protect ourselves against that scenario? Well, going back to the cryptographic primitives we covered in the Ciphers and Hash module, the hash function actually provides us with a really good solution.
When we run the data through a hash function, we have a hash value that is unique to all other hash values for any other piece of data, and we can recreate that hash value over and over again by running the hash function on the original data.
So, if the data has been changed since the last time it was hashed, the hash value is going to be different when we run that data through the hash function, which will tell us that it's been changed.
Now, the real‑world cryptographic implementation of this is called a message authentication code.
A message authentication code incorporates a symmetric key as part of the algorithm in such a way that the symmetric key is required in order to validate the signature.
The idea here being that this metric key provides a way to authenticate that a message came from the stated sender.
If I get a message authentication code, assuming that the message authentication code I generate is the same as the one that was provided to me, then I accept that the message is authentic in the sense that I have received a symmetric key and am able to verify that the message and the signature have not been forged or tampered with.
But trust in the identity of the sender goes only as far as I trust the origin of the symmetric key.
Personally, I don't trust message authentication code authentication unless asymmetric cryptography or a key management system is used in the exchange of the symmetric key.
And, this goes back to the symmetric key distribution problem.
Having the key that's been used to sign the hash does not prove the identity of the signer, or at least it's not as trustworthy of an authentication mechanism compared to a public/private key pair.
Summary
We've been looking at how we can use cryptography to protect data, although data tokenization isn't really cryptography, but it is a useful tool for protecting data.
A subtle, or maybe not so subtle, point that I'd like you to notice is that very often in cryptography, it's not a single method of protection that makes up the solution, but rather a combination of methods that gives us the best security.
For example, encrypting and signing our data secures our data from multiple threats.
We are actively controlling who can access our data by encrypting it, but we also have a way to detect if someone has messed around with our data after the fact.
So with that, we'll wrap up Data Protection.
And next up, we're going to be looking at how cryptography can be used for identity and authentication.
Identity and Authentication
Introduction
Hi everyone.
Thanks for coming back and joining me for this look into some different ways that we can press cryptography into service for the purpose of verifying identity.
Now, before we get too deep into authentication, we're going to separate authentication from authorization because there are technologies that kind of blur the line between the two.
So, we're going to define these guys up front.
Then we're going to jump back into our cryptographic primitives and check out the role that they play in the overall cryptographic solutions.
We're going to take a look into multi‑factor authentication, which doesn't dive deep into cryptography, but it does have a place in a cryptographic system.
And lastly, we'll look at an aspect of digital signatures called nonrepudiation and how it makes a couple of cryptographic applications possible.
Authentication vs.
Authorization
I've noticed over the years that there's often a tendency to use the terms authentication and authorization interchangeably, so let's take a closer look at what these two terms actually are.
Authorization is what you're allowed to access or what you're allowed to do on a system, and it's enforced using something like an access control list or a policy system of some sort.
So, access is selective by nature, and what I mean by that is that the purpose of an authorization system is to allow some people to do an action or access some data, but not permit others.
Otherwise, there really isn't any point to authorization.
To authorize someone or some device, we need to know their identity.
Now, there's something else that's inherent to access control, which is that there has to be a value or sensitivity in data or a risk in allowing someone to do some actions on the system.
Otherwise, we wouldn't care about controlling access to it.
Of course, being sure of someone's identity is more difficulty in the computing world than it is in the real world.
Imagine a small town where the community was very close to one another, and John, who's one of the citizens, goes to the bank there and makes a withdrawal from his account.
The folks at the bank know John and give John his money.
If, on the other hand, Bob goes to the bank and claims to be John, the folks at the bank know that Bob is not John, and so don't give Bob John's money.
Now, the bank's online banking system only knows the identity of the person that it's interacting with by the username that it is given.
If Bob knows John's username, then it's easy for Bob to pretend to be John.
So, you see where this is going.
We have to have some way of knowing that the user is who they claim to be, hence authentication, and we have a lot of different ways to authenticate someone or some device, depending on the situation.
Cryptographic Authentication
When we're talking about authentication, we're really talking about validating and identity.
If we look back at the set of cryptographic primitives that we have, the primitive that really fits the need of authentication is asymmetric cryptography, and the reason that asymmetric cryptography works as a mechanism of trust is primarily because of the relationship between the two keys of an asymmetric key pair.
Now, we're also assuming that the private key remains private and doesn't fall into the hands of another person.
The relationship between the two keys is such that when encrypting with someone's public key, only the private key is able to decrypt that message.
And, likewise, when someone's private key is used to sign a message, only that person's public key will validate the message as a valid message.
Because of this, we can move data across an untrusted network securely.
Now, if we have a high degree of confidence that we're getting someone's public key, like we get the key directly from the person, then we can consider the key is trustworthy.
Most of the time, though, we're not getting the key directly from a person that we know.
Actually, most of the time we're not even communicating with a person.
Usually when we're using asymmetric cryptography for identity validation, we're using a vetted certificate attached to a public key to validate the identity of machines that we're interacting with, like SSL for example.
Or, we use them for trusted distribution of software like with code signing.
And because we aren't receiving the key from somebody that we know and trust, we're back to the problem of having no way to determine the credibility of the information given to us.
So, we need a third party that we can trust to validate the information for us, and this third party is called the certificate authority.
Now, when an entity asks a certificate authority to issue a public key certificate to them, the certificate authority will require documentation from the entity to ensure that the information they are providing is accurate and legitimate.
When I think about authentication though, asymmetric cryptography isn't really the first thing that springs to mind.
In my mind, identity is more about a person using a computer, so let's dive into that aspect of authentication.
Multifactor Authentication
I would say that the most common form of authentication is a username and a password.
Passwords have become problematic, well, they've been problematic for a while, and they're problematic because of the complexity required to withstand the attempts to break them by the many tools made for just that purpose.
On top of all this, many places require that we change our passwords periodically.
We also have so many of them.
At work, we typically have multiple systems that we have to log into, and on top of that every site we interact with, in any meaningful way anyway, requires that we have an account.
We're told that we're not allowed to write our passwords down and that every password has to be different for every single account that we have.
And while there are very good reasons for doing all of these things, if we have to have strong passwords for every account, it would be impossible to remember all of the different passwords we have and which account they belong to.
So out of practicality, most of us tend to use the same password across sites, and when we have to change them, we'll make slight changes that we can remember.
So, to protect their systems and their users' data, you'll find more and more organizations using multi‑factor authentication.
Multi‑factor authentication requires at least two of the following, something you know, something you have or something you are.
The most common combination is something that you know and something that you have.
The something that you know would be your password, and the something that you have is usually a one‑time password generator.
Now up until recently, there was a device, usually called a key fob, that you would have to carry with you that displayed a six‑digit code, and that code would change every so often, usually 30 or 60 seconds.
It's pretty rare to see these today because most everyone has a smartphone that a one‑time password generator can run on.
The dedicated key fobs were also expensive, not only in terms of the price per fob, but also the logistics and the administration of them.
And, when a fob's battery ran out, it had to be replaced with an entirely new one.
But getting back to the point of MFA.
When you log into a system with MFA, you have to give both your password and provide authentication through the multi‑factor authentication system.
And, I say provide authentication because even though the one‑time password generator is the most common form of multi‑factor authentication, some MFA systems will contact an app on the user's mobile device, which will ask them to confirm that they are the ones who are logging in.
So if somebody else gets a hold of your password, they still can't log in without the second authentication factor, for example, the one‑time password generator on your mobile device.
Digital Signatures
The term digital signature is used in several contexts, and we've talked about some of those contexts in this course already, but what I want to focus on here is what's called the nonrepudiation factor of asymmetric cryptography.
The integrity and the authentication aspects of a hash value continue to hold true, of course.
But, nonrepudiation means that it cannot be said that a particular entity, whether that entity is a person or an organization, that entity cannot say that they did not sign a particular piece of data because it was their private key that was used to sign the data.
And in this context, sign means that we're generating a hash value from a particular piece of data.
And actually, let's back up a step here and look at what a valid digital signature tells us.
It tells us that the data has not been altered, that the data has not been forged, assuming that the private key has not been compromised, of course, and provides protection from any party attempting to deny that the data was signed by someone other than the actual signatory after the fact.
So, digital signatures provide some serious protection in our society where more and more transactions are being done in a disconnected way, meaning that the parties in these transactions are not physically in the same place.
There are several different implementations and applications of digital certificates in use, a couple of which we've talked about already, like code signing an SSL digital certificate.
But one implementation is something of a literal digital signature.
If you're familiar with DocuSign or Adobe's electronic signature product, for example, this is the underlying mechanism that's being used as a person's legal, nonphysical signature on a document, and this is because it provides assurances that the document has not been altered or forged and that the parties that have signed the document can't later say that they were not the parties that signed it.
Summary
Something I didn't mention in here were technologies like OAuth and SAML, and this was because they're authorization mechanisms rather than authentication mechanisms.
And granted, there are places where they're used as an authentication mechanism, such as when you go to a site for the first time and choose the log in with Facebook or log in with Twitter or log in with whatever account, you'll get hit with that screen that says this site wants access to your contacts or your photos or some other personal information.
It's because OAuth and SAML or OpenID, or whatever you have, is an authorization mechanism that's being used as an authentication mechanism.
So with that, we're finished with our look at Identity and Authentication, and next up here we have Secure Multi‑party Computing, so I look forward to seeing you there.
Secure Multi-party Computing
Introduction
Hi everyone.
Glad to have you back here with me.
We're going to talk about cryptography where the theory and the design have been around for a long time, but really only now has the computing power become available to actually implement it in any practical sense.
And I think there will be a growing focus on this area of cryptography given the privacy concerns that are prevalent in society because this area of cryptography allows different parties to hold on to, sign, and process data without ever knowing what the data is.
First off, we're going to be looking at how multiple parties can divide up data so that each party can hold on to the data, but cannot access it without consent from the other parties.
We'll look at how data can be safely signed without the signer knowing what the data is, and the last thing that we'll be looking at in this module is arguably the most interesting and just cool.
We'll be looking at homomorphic encryption, which is data that's encrypted in such a way that you can use the data mostly as you would any other data, but while it's still encrypted.
So let's go.
Secret Sharing
Secret sharing is ideal for safely storing data that's sensitive and important, has controlled access, and must persist.
We really can't meet these requirements with any of our primitives because we need to be able to read the original data again and we would have to store our key.
Now, storing one copy of the key increases our ability to keep the key safe, but it puts us at a much greater risk of losing access to the data if the key is erased.
Storing multiple copies of the key puts us at a much greater risk of the key being compromised, but it does reduce the risk of us losing the key altogether.
Now, secret sharing takes a completely different approach to this problem.
It breaks the data down into separate pieces, and then those pieces can be distributed.
To recover the data, those pieces are combined again to reconstruct the original data.
Because no one person has all of the data, the message is secure.
But, if any piece of that data is lost, then we can't recover the original data, and that's a problem.
The solution to this problem is called the threshold value, which specifies the minimum number of pieces that are required to reconstruct the data.
When a threshold value is set, the algorithm will divide the data in such a way that a single shareholder is not able to know the data while at the same time making it possible for any combination of the keys to reconstruct the data.
So, let's say that the data is split up into 10 shares with a threshold of 3 shares required to reconstruct the data.
So long as there are 3 keys, it doesn't matter which 3 of those 10 keys are used.
There are a couple of secret sharing algorithms, which are Shamir's Secret Sharing algorithm and Rabin's Information Dispersal algorithm.
Now getting into the details of how these two go about splitting and masking the data is really out of the scope of this course, but there's a lot of information out there about how these two work.
Blind Signatures
A blind signature is a form of a digital signature where the signatory has no knowledge of the data that they're signing.
Now, this is not to say that there isn't any accountability, but the assumption that the data is legitimate is part of a blind signature scheme.
These signatures are used in places where privacy is most important, the two common applications being digital currency and voting systems, which is kind of handy since the most common analogy of how a blind signature scheme works is a voting system.
Let's think about a voting system.
We want to keep the voter's vote anonymous, but we also want to ensure that the vote is legitimate.
Rather than getting into the cryptographic details of how exactly this is computed, I'm going to go over this concept within an analogy.
The math and the protocols involved are complex to the point where it would be difficult to understand the concepts otherwise.
So, the voter casts their ballot and places their ballot in an envelope with their information printed on the outside of the envelope.
This envelope is also lined with a piece of carbon paper.
The sealed envelope is presented to a voting official who validates that the identity information on the envelope is correct, and then signs the outside of the envelope, transferring their signature to the ballot.
The voter's vote, having been signed by a voting official, can now be proven to be legitimate because of the voting official's signature, but at the same time, the voter's vote is not known to the official.
And finally, the voter transfers there signed vote to a different envelope that does not contain any of the information about their identity.
Now, when the votes are being counted, our voter's ballot can be validated as legitimate because of the voting official's signature on the ballot, and at the same time the identity of the voter is kept anonymous.
Implementing a blind signature scheme typically involves blinding a message with a blinding factor and signing it with an asymmetric key.
Now, the blinding factor is random, but it's also calculated in a way that is very similar to how an RSA key pair is calculated.
And, by calculating the blinding factor in this way, it's possible to validate the blinding factor itself, along with the signer's public key.
If you're interested in how those keys are calculated, I go step by step through the process in my Asymmetric Cryptography course.
Next up, we're going to go beyond signatures and get into working with data while it's encrypted.
Homomorphic Encryption
Back in the Data Protection module, I said that there were three categories of data when it came to data protection, data at rest, data in transit, and data in use.
I covered the data at rest and the data in transit and said that we would cover data in use later on.
Well, we've arrived at later on.
And, the cryptography that we're going to be talking about here really brings some substantial promise to solving some problems that we have in a few areas of society right now, and it's called homomorphic encryption.
This concept was first formed back in the late 1970s, but it really hasn't been feasible up until around now, and you'll see why as we get into it here.
Homomorphic encryption is a form of encryption that lets us run computations on encrypted data.
Now that's a huge statement.
We can process data without having to decrypt it first.
We do suffer a performance penalty, of course, because we're computing data that has been encrypted with asymmetric cryptography, meaning that we're dealing with computations on enormous polynomials.
Now, the numbers that I've heard most often have been a 50:1 CPU usage ratio and a 20:1 memory usage ratio.
So, it takes roughly 50 CPU calculations as compared to a single CPU calculation to use homomorphic encryption and 20 bits of memory as compared to 1 bit of memory.
The benefit to privacy, though, is obviously massive, using a mobile device to find some sort of service near me, for example.
For that request to be processed, you have to hand over at a minimum your location and the type of service that you're looking for.
And, of course, the entity processing the information will most likely record other metadata about you at the time of the request also.
Homomorphic encryption will allow you to get the same information from the entity processing the request in that same scenario without giving any of your data away.
You encrypt the data before sending it off to the processing entity, which has the ability to do the calculations necessary on the encrypted data, and then sends that answer back to you at which point you decrypt the data that contains the answer.
Homomorphic encryption also opens up the ability for datasets of sensitive data to be used without giving away any privacy of anyone who might be part of the dataset, like medical data, for example.
Imagine someone with a new idea in researching the cure for Alzheimer's or anxiety or cancer, and as part of their research, they need to know how many cases share some sort of a common element.
At the moment, privacy considerations aside, to be able to assemble such a dataset, if it's even possible, would probably be prohibitively expensive and perhaps stop a promising approach.
But if that same dataset was encrypted homomorphically and was generally available to allow such research without compromising the privacy of those involved in the dataset, that has some serious potential.
Again, this technology isn't new, it's been around for probably 50 years, but the computing power available now, particularly in the form of cloud‑based computing, makes homomorphic data processing feasible and generally available to most entities.
Summary
Even though most of these areas of encryption have been around for a while, I think they're becoming more relevant with some of the issues that we're facing today, and we'll see this again in the Quantum Cryptography module, specifically Shor's algorithm, which is a quantum computing algorithm developed back in 1994, it just had to wait 30 years for a quantum computer that's stable enough to come along to run it.
But, as computing power and cloud infrastructure continue to get more powerful, I anticipate seeing homomorphic cryptography in particular grow with it.
Anyway, that's it for Secure Multi‑party Computing.
And next up, we're going to dive into the lifecycles of cryptographic keys and I'll see you there.
Cryptography Key Lifecycles
Introduction
One of the biggest reasons we use cryptography is to keep valuable and sensitive data safe.
The security of the data most heavily relies on the security of the cryptographic keys, and there are more factors to how secure a key is than the strength, although the strength is an important factor.
The way the keys are used, how the keys are stored, how they are protected and managed all factor into the security of a key.
Getting all of these factors right isn't easy, and the consequences for getting them wrong is potential exposure of your data, which is why we employ key management systems.
And, coincidentally enough, we're going to be looking at key management systems in this module.
Key management systems operate with cryptosystems, so we want to be sure that we have the context of what a cryptosystem is in our minds.
So once we have a good idea of the environment that the key management system will be operating within, we'll then take a look at a key management system and we'll be focusing extra attention on the lifecycles and the lifespan of keys within a key management system.
The Cryptosystem
In our minds, let's picture a space where all cryptography happens.
We're going to call this space the cryptosystem.
And for the next couple of minutes, we're going to watch all the different things that happen with the keys in our cryptosystem.
First, let's encrypt some plaintext.
We need a key to encrypt plaintext, and right now we don't have one.
So we'll create one, and in this case it will be a symmetric key.
We'll feed the key to the cipher and, bada‑bing bada‑boom, you've got yourself some encrypted data.
A while later, we come back and we want to get that data back out again.
Not a problem.
We feed the key to the cipher and run the encrypted text the other way through the cipher to get our plaintext back.
We're done with the data now, and this was the only data that the key was used for.
Now, the longer a key hangs around, the less secure it becomes because there's more material an attacker could use to break the key.
Also, the more data upon which the key is used, the more data is at risk if that key is compromised.
So, we want to destroy it.
Now what I've described here is a simple lifecycle for a cryptographic key.
In this one fairly simple scenario, we created, stored, used, and destroyed a key.
Managing a single key's lifecycle isn't too difficult, but when we start adding keys to the cryptosystem to handle different scenarios, keeping track of the state of the keys and the needs of the keys in the cryptosystem becomes increasingly more difficulty, which is why in modern cryptosystems you will usually find a cryptographic key management system which is responsible for managing the lifecycle of the keys, as well as guarding access to those keys.
The Key Module System
The structure of a key management system isn't sophisticated, but there are a few moving parts, so to speak.
The job of a key management system is to manage keys.
I know I'm stating the obvious here, but hang in here with me.
The name of the keys that we manage are data encryption keys, and I'm pointing this out because there's another key in the system called the key encryption key whose job it is to encrypt all of the data encryption keys that are stored in the key management system.
The key management system also has a Key Management API, which is responsible for, among other things, decreasing the key stored in the database before sending the key to whomever requested it.
All of this is, of course, for symmetric keys handled by the system.
Key management systems also handle asymmetric keys in a way anyway, The purpose of asymmetric keys in a key management system is to securely exchange symmetric keys with another entity.
You may remember back in the Ciphers and Hash module, I talked about how symmetric keys are significantly faster than asymmetric keys for exchanging data.
And one of the primary purposes of an asymmetric key pair is to establish identity.
Once identity is established, the two sides will agree on a symmetric key, and that is the key that will be used for further encryption between the two sides.
This is what the key management system is doing with respect to using asymmetric keys.
So, that's the structure of a key management system, but one of the main purposes of a key management system is to manage the lifecycles of the keys which it contains, and this is what we'll be focusing on for the rest of this module.
Key Lifecycle
A key's lifecycle is comprised of four stages, pre‑operational, operational, post‑operational, and deletion stages.
The pre‑operational stage is pretty straightforward.
We're going to generate a new key, encrypt that key with the key management system's key encryption key, and then store the new key in the key management system's database.
Right now we can't use this key, though, because it has not yet been activated, and I'll explain activating a key here in a second.
When we generate a new key, we also want to generate the metadata associated with that key.
Each implementation of a key management system will have their own set of metadata or properties that it keeps for a key, but all of them will typically have the creation date of the key, the key activation type, which at the base level can be right away when the key is created, automatically at a given date or it could require the manual activation of the key.
The key management system will also require the date that the key was activated, the size of the key, whether or not the key can be deleted, and the access criteria, and, of course, any other user‑defined metadata that whomever wants to throw in there.
For the key to move from the pre‑operational stage to the operational stage, the key must be activated, which is just a fancy term that means changing the status of the key to active in the key's metadata.
Now something that I've said a few times other places in this course is that the security of the key decreases as it gets older, which means that we're going to want to change the keys periodically.
So, after a period of time, we will retire or deactivate the key.
And when we do this, we're simply making it so that the key will not be used for any further encryption.
We cannot get rid of the key altogether yet because there is presumably still data that is encrypted with that key, so the key is archived rather than destroyed.
Now, destruction of a key happens when the key has been compromised or when a set of criteria for key destruction is met.
The criteria might be based on security or business policy, or it might be when there's no longer any data that's encrypted with the key, or it might be some other criteria that makes sense to the key owner.
So, we've covered all the various states of a key within a key's lifecycle, but how long does that key actually live for? Well, that's what we're going to talk about next.
Key Lifespan
So how do we determine how long a key will remain active? Well, the main criterion we use is how sensitive is the data? The more sensitive the data is, the shorter the lifespan of the key.
If the data is highly sensitive, then the active lifespan of the key might be only as long as the time it takes to encrypt the data.
After that, the key is deactivated and archived.
We obviously don't want to delete the key at this point because we wouldn't be able to recover the data down the road when we need it.
The sensitivity of the data is the main driver, but there are also a few other factors that will affect how you design your key strategy, such as how long will the data be in use, how is the data being used, how much data is there, and how much damage will be done when the data is exposed or the keys are lost? For less sensitive data that's being used frequently and won't do too much damage to us, it might make more sense to have a key that we change periodically.
For example, a company might encrypt less sensitive data with a key that is changed every six months.
If we're changing the key on a regular cycle, then we need to decide what we're going to do with the old keys.
Again, we don't want to toss them away because we're most likely going to have data that's encrypted with that key, so we'll put that key in archive for some period of time so that we can recover the data encrypted with that key.
Let's say for the sake of this example that we will keep the deactivated key for another six months before we delete it.
If we add up the active time and the archived time of the key, we get a total time of one year.
This time is labeled as these crypto period of the key.
Summary
What we've talked about here certainly doesn't cover all aspects of key management systems, but you're at a good point now to explore the areas of key management systems that are of most interest to you.
The reason I took more time with the lifecycle and the lifespan of keys is because these are the aspects of how you make use of a key management system.
These are the points that are going to be of concern to most folks, Here, again, I haven't covered all aspects of the lifecycle and lifespan of a key, but you now have the foundation for learning about those aspects of the lifecycle and lifespan of keys that are most important to you.
Our last module is next, and we're going to be looking at the impact that quantum computing will have on current cryptography.
So, I will see you there.
Quantum Cryptography
Introduction
What I'm going to be covering in this module is not so much quantum cryptography, but rather the impact and the threat to current cryptography that quantum computing poses.
Something interesting about this threat is that the threat primarily applies to asymmetric key pairs.
Symmetric key pairs can, for the most part, withstand a quantum computer's attack and maintain the same level of security by doubling the key size.
The fact that the threat is limited in scope to asymmetric keys doesn't make it any less dangerous though, especially considering how important the role of asymmetric keys play in current cryptography.
So, we'll be looking at what exactly this threat is, what the current state of quantum computing is, and what we should do about all of this.
The Threat
Asymmetric keys are secure because it's hard to figure out the numbers that were used to make them.
At a very high level, these keys are created using two really, really big prime numbers.
An RSA‑2048‑bit key is made up of two prime numbers, each being 617 digits long.
It's easy to create a key pair because you choose the two prime numbers that will be used, but right now there isn't any way for a person or a computer to efficiently recover the original prime numbers that were used.
And efficient is the keyword here.
Recovering the original numbers is possible, but the most recent estimate that I'm aware of is that it will take a current machine something like 300 trillion years to find the original numbers.
Now, a perfectly optimized quantum computer can, theoretically anyway, break a 2048‑bit RSA key in roughly 10 seconds.
One of the fundamental differences between current computing and quantum computing is that a current computer's basic unit is a bit, which is 1 or 0, either on or off.
A quantum computer's basic unit of information is called a qubit, and a qubit takes advantage of two quantum physics phenomenon, which are superposition and entanglement where a qubit can be 0, 1, 2 or 4, or, through superposition, all four of them at the same time.
This allows them to solve particular problems a lot more efficiently, and factoring out the prime numbers of a key pair is one of those problems that a quantum computer can solve more efficiently using Shor's algorithm.
Shor's algorithm is a quantum computing algorithm that was developed in 1994, and it tackles the specific problem of factoring out prime numbers.
The size of the number doesn't impact the time that it will take the algorithm to find the prime numbers.
So, unlike symmetric keys, making the key bigger just won't help.
That being said, we are not in danger yet, and I'll break this down next
Current State of Quantum Computing
Boiling down all the math and the theory and the current functionality of quantum computers, there are two kinds of quantum computers, universal quantum computers and adiabatic quantum computers.
AQCs require optimized algorithms because of the way that they operate compared to UQCs.
To break a 2048 asymmetric key, UQCs require 4099 high‑quality qubits that last for at least 10 seconds.
Right now, UQCs are running at about 65 noisy qubits and die somewhere under 100 ms.
AQCs, on the other hand, are at about 5000 qubits, but at the moment they require 20 million qubits to find the prime numbers.
Now, looking at these numbers, you might ask yourself, well, what's the problem? Well, the main threat right now is from algorithm optimization of AQC algorithms.
These algorithms to factor prime numbers has gone from needing 1 billion qubits to 20 million qubits in about seven years, which still sounds like we are pretty safe, but the rate at which the optimization is occurring is increasing.
One billion qubits were needed in 2012.
In 2017, the algorithms were optimized to needing 230 million qubits.
And two years later in 2019, we only needed 20 million qubits.
And the optimization is continuing.
Now at the same time hardware is improving.
IBM is shooting for a 1000‑qubit UQC by 2023, and Google is shooting for a 5000‑qubit computer within 10 years.
Now ultimately we don't know when the combination of hardware and optimization will meet to be able to break an RSA key.
But right now, we can expect that it's at least a matter of years away.
So what do we do about this to protect ourselves? Well, we're going to look at that next.
What Do We Do About It?
It's a good idea to start planning for post‑quantum cryptography now, mainly because of the time that it takes for large organizations to transition.
NIST is working on what it is calling the Post‑Quantum Cryptography Standardization Process, and this is an ongoing program to develop algorithms that are resistant to quantum cryptography analysis of current cryptographic keys.
The state of algorithms submitted and those that will continue to be considered have been narrowed down from 65 to 15, and the final selection of an algorithm is expected to be available somewhere between 2022 and 2024.
These algorithms do not require that we abandon our current cryptography systems and keys.
Rather, they're designed to harden our current cryptography against quantum computer attacks.
So, if the algorithms are 2 to 4 years away, what do we do right now? I think that the most productive step is to figure out how to ramp up your organization's cryptographic agility, meaning what changes should you make to your systems so that they're able to adapt to security threats quickly and handle modification of cryptographic algorithms smoothly.
You also want to keep up with the news coming out of NIST.
And, as much as possible, look at incorporating the post‑quantum algorithms that they're publishing into your systems.
We have time to look at and plan out good solutions to this problem.
Summary
At the end of the day, the threat of quantum computing is, at its core, a matter of economics.
Every new technology is expensive initially and has limited abilities.
So, when the hardware becomes stable enough and capable enough to actually factor these gargantuan prime numbers, only governments will be able to afford them, and they'll be limited to attacking only the most valuable secrets of other nation states.
As cost drops, organized criminals will start breaking keys, but, again, they will have limited power and will only focus on the targets that have the best return.
Now, if quantum computing ever makes it to consumer‑grade hardware, these problems will already have been solved, so the probability of widespread data theft is pretty low.
And that's it.
I appreciate you spending time with me here and looking into how to get started with cryptography.
If you have any questions or want clarification, please post in the discussion group and I will do my best to quickly respond.