Outline

What is Hashing in Cryptography

Required Properties of Secure Hashes

Bit Security of Hashes

Fundamental Principles in the Design of Secure Hashes

The Birthday Attack and The Rho Method

Classes of Hashes

Message Digests

Extendable-Output Functions

Password Hash Algorithms

Blockchain Algorithm

Introduction to Message Digests

The SHA-2 Family

SHA-256

SHA-512

SHA-384

Length-Extension Attack Against SHA-2 Hash Family

The SHA-3 Family

Summary of SHA-3 Hash Family

Attacks against SHA-3

Introduction to Extendable-Output Functions

The BLAKE Hash Family

Introduction to Password Hash Algorithms

Major Password Hash Algorithms Used in the Industry

Argon2

PBKDF2

Scrypt

Yescrypt

BCrypt

The Best Resources to Learn How to Apply Hashes

What is Hashing in Cryptography?

In digital computing a hash is a mathematical function that accepts a bit sequence and outputs another bit sequence.

Hashes are a fundamental tool in cryptography that can be used to build other cryptographic functions--including but not limited to cryptographically secure pseudo random number generators (CSPRNGs), public-key based digital signature schemes, key derivation functions, and are even used in symmetric-key ciphers for encryption.

In digital cryptography, secure hashes are used to test if a document was received without error while it was being sent. This is important because data sometimes gets corrupted when transmitted from one machine to another.

Hashes are __not and can never be reversed through empirical means--not even with infinite computing power. __*This is because there are infinite messages that map to a hash output of a fixed value.*

Due to the above property of hashing it is important to __not confuse hashing with encryption. Although encryption produces unpredictable, random data just as hashing does encryption is supposed to be reversible whereas hashing is not!__

Required Properties of Cryptographically Secure Hashes

Hashes suitable for cryptography must generate outputs that are unpredictable ahead of time. There are three properties hashes must have to be suitable for cryptography:

Collision Resistance

Preimage Resistance

Second Preimage Resistance

Let's break these down:

Collision Resistance

All hash functions must generate a finite bit sequence of a fixed length. There is a problem with this. A bit sequence of length "n" will only be able to make 2^n distinct hash outputs. If you have more than 2^n documents you want to scan with your chosen hash you will end up with at least two distinct documents that are mapped to the same hash output! This is known as the *pigeonhole principle*.

Secure hashes must be designed to make it * very difficult* for an attacker to identify a pair of distinct documents that are mapped to the same hash output.

Preimage Resistance

Let's say an attacker finds the hash of a document. Remember, hashes help recipients of a message test if they have received a document from another machine without error. It should be impractical for an attacker to figure out a message that maps to the hash of another document. This makes the hash algorithm *preimage-resistant*.

If the attacker succeeds in finding another message that maps to the same hash as the authentic document the attacker can send the other message and trick the recipient into thinking they have received the authentic one since it will pass the hash test!

Earlier I said hashing cannot be reversed even with infinite computing power. You would not be able to determine the exact message that was applied to the hash function since there are technically infinite messages that map to a given hash output of any size.

A hash gives us a reliable assurance that we have received the correct message--though receiving the wrong message is still possible despite the wrong message passing the hash test. Even though it is possible there is a small chance that it happens.

Second-Preimage Resistance

Let's say the attacker knows a message and its corresponding hash output. Second-preimage resistance means it is impractical for the attacker to find a another message that maps to the same hash output. If you pay attention to the previous definition this implies that any hash that is collision resistant *is also second-*preimage resistant too.

If the attacker can do a second-preimage attack the attacker can also conduct a preimage attack. That is because the attacker can first collect a message and compute its hash value. Next the attacker can derive a different message that maps to the same target hash value--which is what the attacker does in a preimage attack. A hash that is second-preimage resistant is thus also first-image resistant.

In summary when designing a secure hash the hash must be designed for collision-resistance. If this is true then the hash also is second-preimage resistant. And if the hash is second-preimage resistant we know the hash is also preimage-resistant as well!

Bit Security of Hashes

When we talk about the bit security of hashes we say such a has has x bits of security against said attack. That means it will take, at the worst, 2^x guesses before identifying a hash input that makes said attack successful. I will talk more about bit security of hashes when reviewing hash algorithms later.

The Birthday Attack and the Rho Method

The attacker can find hash collisions needing only the __square root__ of guesses as the bit size of the hash output, or 2^(n/2) guesses instead of 2^n. This is known as the birthday paradox.

A simple computer program can be made to take advantage of the birthday paradox. The attacker can choose a target hash. The attacker can compute the hash of the target hash itself. The attacker can next compute the double-hash of the target hash ( the attacker hashes the output of hashing said hash once a second time in a row ). If they do not match the attacker tries again with the output of applying the hash to the previous target hash used until there is a match. This is known as the __Rho Method__ for finding collisions and only takes 2^(n/2 + 1) guesses instead of 2^n, where n is the size of the hash output in bits.

Classes of Hashes

Now that I have described the expected properties of secure hashes and the general attacks that can be done against them I'll review the three main purposes of hashing in cryptography:

Classes of Hashes

Message Digests

Password Hashing: Each time you enter a password through a login interface your password is applied to a password hash. Unlike the other classes of hashes--password hashing is

*supposed*to be slow! That's what makes logins safer! A standard hash should take microseconds for a machine to compute. A password hash should take aThis slows down the attacker's ability to crack your password!__full second.__Proof-of-Work: A client making a request to a server is required to solve a puzzle using a hash algorithm before the server honors the request. This is a technique to limit the requests the client can make to the server. There are two famous

Introduction to Message Digests

A message digest computes a checksum--or an identification number--that is supposed to be unique to the document the message digest was applied to. This helps recipients verify that a message was not corrupted during transmission.

There are several secure message digests used in the industry and that compete for a cryptographic engineer's attention. Let me introduce them to you:

SHA-2 Family

The National Institute of Standards and Technology has approved of the use of these message digests for non-military use. They are used worldwide. Although they are used throughout the world there is no mathematical proof of their security known!

There are three message digests that are members of the SHA-2 Family:

SHA-256

SHA-384

SHA-512

The numbers for each represent the bit size length of the hash output.

SHA-256

The de-facto standard for secure message digests. By default most cryptographic protocols choose SHA-256 as a message digest instead of any of the others. This is because it is a US Federal Government approved message digest that has been used the longest. SHA-256 outputs a consecutive sequence of 8 32-bit words, making a the final hash size 256 bits. SHA-256 accepts 512 bits of data from the message at a time, which are called blocks. SHA-256 offers 256 bits of security against a collision--meaning it would take at the worst 2^256 guesses before the attack identifies some input that results in a hash collision against a document. SHA-256 also offers 256-bit preimage resistance. SHA-256 offers 128-bits of security against __Grover's Algorithm__--an algorithm that can be done by a quantum supercomputer.

When you take a look at the above block diagram of SHA-256 we see that SHA-256 works with 32-bit words.

SHA-512

SHA-512 works the same as SHA-256 except it accepts blocks of 1024 bits of data at a time as a consecutive sequence of 16 64-bit words and finally outputs 8 64-bit words. SHA-512 also does 80 rounds instead of just 64 rounds as SHA-256 does. Despite its output length, SHA-512 still only gurantees 256 bits of collision resistance. It does offer 256 bits of security against Grover's Algorithm.

SHA-512 offers little more practical security than SHA-256 or SHA-384. SHA-384 should be enough even against Quantum Computing Attacks.

SHA-384

SHA-384 is recommended by the NIST as a hash safe against quantum computing attacks targeting hashes. The __BHT algorithm__ would reduce the collision bit security down to 128 bits of security--but this is enough for NIST standards. Even SHA-256 would be reduced down to 85.333 bits of security--which is not good enough. The NIST expects quantum-safe algorithms to have 128 bits of security against relevant quantum computing attacks.

SHA-384 works the same as SHA-512 except it accepts a different initialization vector and truncates the output to 384 bits.

Length-Extension Attack

All hash functions of the SHA-2 family __are vulnerable to this attack__. In this attack the attacker does not know the original message tied to a known SHA-2 hash output. The attacker only knows the length of the original message instead. The attacker only needs to know the length of the original message and the message's hash output to deduce the hash output of a third message. This third message appends a second message appended to the original.

The SHA-3 Family

__Keccak__ was chosen by the NIST to succeed the SHA-2 family. The NIST chose it because it was designed...differently than SHA-2. The NIST's logic was that if, for whatever reason, someone discovered a serious flaw in the SHA-2 family they would not be able to apply the same attack against SHA-3. Of course the new Keccak design will later have security flaws anyway. They just do not exist yet since they have not been discovered ;) Just as the SHA-2 family SHA-3 offers hash outputs of sizes 224, 256, 384, or 512 bits.

Hundreds of cryptanalysts have tried and failed to break SHA-3. It seems SHA-3 is safe to use in general for now.

Keccak consists of four mathematical constructions that work in sequence named: Theta, Rho Pi, Chi, and Iota.

Unlike the SHA-2 family Keccak carries a "sponge" construction that allows it to serve as more than just a message digest. It can also be used to generate a bit sequence of an __arbitrary length.__

There are two algorithms based on Keccak that do this: SHAKE128 and SHAKE256. Their numbers mean their bit security strength. The word SHAKE means Secure Hash Algorithm with Keccak. They are classified as __extendable-output functions____ (XOF)__.

Introduction to Extendable-Output Functions

Extended-Output functions generate bits that are __indistinguishable from random__ without knowledge of the secret used to generate them. They can be used __generate cryptographically secure pseudo random data__ such as Initialization Vectors and symmetric keys. As the Reddit article makes clear it is more convenient to use a XOF for such purposes instead of a regular Key Derivation Function. __https://www.comparitech.com/blog/information-security/key-derivation-function-kdf/__

The BLAKE Hash Family

Just as SHA-2 the BLAKE family offers hashes of sizes 224, 256, 384, and 512 bits. BLAKE was submitted to the NIST as a contestant in the SHA-3 Hashing Competition in 2007. BLAKE lost to Keccak since Keccak was faster when it was programmed into dedicated hardware. Keccak's design was much different than SHA-2. The importance of this, according to the NIST, is that in case a serious exploit in the SHA-2 family is discovered there is little reason to believe SHA-3 will be affected. Keccak also could be used to make extended-output functions whereas BLAKE could not.

Despite this, BLAKE has an advantage as a software-pure implementation when a hardware-accelerated implementation of SHA-3 is not suitable for one's use. BLAKE also has a built-in mechanism for keyed hashing--this is an important feature of using __Hash-based Message Authentication Codes__ (HMACs) that are built using hash functions from the SHA-2 family.

Keyed hashing allows a recipient of a message to test if the message was sent without corruption and that the person sending it is who they say they are by testing the number generated by keyed hashing using a shared symmetric key and message. Whereas BLAKE has built-in support for optional keyed hashing a MAC construct built on top of Keccak has to be made. This construct is known as __Keccak-based Message Authentication Code__ (KMAC).

Unlike the SHA-2 series, BLAKE is resistant to the Length-Extension Attack.

Introduction to Password Hashing Algorithms

Unlike all hash algorithms or hash algorithm constructions mentioned so far--password hashing algorithms all have one ironic trait in common: they are __supposed to be slow!__ This is what stalls the attacker from eventually guessing your password. Whenever you type in your password in a login portal the login portal wastes time hashing your password before comparing the final output hash to the one in the database.

There is a reason why login systems hash your password. In case an attacker steals the hash database the attacker will be stuck with the password hash--not your actual password! If the attacker submits the password hash a different hash will be generated and the attacker will be denied the ability to change the password on the system.

If hashes make it unfeasible to guess which plaintext is associated with a hash a good question is why design a hash that is *slow *instead of just using a message digest such as BLAKE3? The problem is--distinct people use the * same* password.

To make this less obvious random numbers are appended to the password called a salt. This will ensure two distinct users that use the same password have different hash outputs--their salts differ! If we can just append a salt to a hash to make the final hash unique from others even if others use the same password then why go through the Attackers would be able to figure this out

There are five major password hash algorithms used in the industry. Let me break them down:

Argon2

In 2015, Argon2 became the official winner of the Password Hashing Competition. Today, security professionel recommend Argon2 as a password hashing algorithm. Unlike all the other password hashing algorithms later mentioned Argon2 is resistant to advanced GPU and even ASIC cracking attacks.

It does this with a property known as memory-hardness--meaning the speed of execution is limited by the physical speed limit of your volatile memory and not CPU. RAM, a common form of volatile memory, do not vary much in speed in the market--hence where teh value of memory-hardness comes from. An attacker may be able to afford powerful GPUs and ASICs--yet their RAM may be not much faster than ours on our machines.

It is also resistant to attacks that determine the password by paying attention to patterns in program execution during hashing--a technique known as a timing side-channel attack.

No other password hashing algorithm offers such protections--and that's why it won the competition! It is best to use Argon2id--which offers resistance against cracking attacks by powerful machines and side-channel resistance.

PBKDF2

The second most-used password hashing algorithm is Password-Based Key Derivation Function version 2. It offers resistance against rainbow table and dictionary attacks. Unlike Argon2 it is not resistant to GPU cracking attacks.

SCrypt

Scrypt was also designed to be resistant to GPU, ASIC, and FPGA attacks. SCrypt is most often used in blockchain applications including LiteCoin, Dogecoin, Monacoin, and more! You can check for a full list __here__.

You can learn more on how SCrypt builds upon pre-existing cryptosystems by reading __Request for Comments 7914__.

BCrypt

BCrypt is a password hash function that was made earlier than SCrypt. It is not as resistant to GPU and ASIC attacks. This was a standard password hashing function in the past for Unix/Linux systems. Today it has been replaced by yescrypt, a finalist in the Password Hashing Competition that lost to Argon2. It was the first major password hashing function that allowed the security engineer to adjust the speed of the algorithm. This will help BCrypt adapt to Moore's Law. Still, it is now outdated compared to modern password hashing algorithms such as Argon2 since it lacks the memory-hardness feature Argon2 has.

The Best Resources to Learn How to Apply Hashes

For more information on secure hashing I recommend the following books:

Serious Cryptography by No Starch Press

Practical Cryptography for Developers by Vladimir Nabokov

Are you interested in learning how to program hash algorithms just as myself. I got so tired of finding good books on it I decided to start writing my own--and I would be happy if you read the beta draft!

My beta draft reviews how to program the math you need to program cryptosystems such as hash algorithms

If you're interested you can check out my __beta draft here__.

Thanks for reading my blog!

Front blog cover image provided by __macrovector posted on Freepik__.

## コメント