PBKDF2 Hashing Algorithm
Before moving into the PBKDF2 hashing algorithm, have a look at this article to have an understanding of Password Hashing which is a mandatory prerequisite.
PBKDF2 is a Password-Based Key Derivation Function in which a key is generated from the Password. The generated key can be used as an encryption key or as a hash value that needs to be stored in the database. Here, we are going to look at how PBKDF2 is being used as a hashing algorithm to hash passwords.
PBKDF2 is a modern hashing algorithm that is being used nowadays since it has a considerable computational cost which reduces the vulnerabilities of brute force attacks.
Key Derivation Process
The PBKDF2 key derivation function has 5 input parameters.
DK = PBKDF2(Password, Salt, PRF, c, dkLen)
The output of the PBKDF2 function is the Derived Key. (In our case DK is the hashed value)
- Password:- The master password from which the derived key is generated. (In our case, this is the password that needs to be hashed)
2. Salt:- Sequence of bits known as cryptographic salt.
3. PRF:- Pseudo-Random Function is the basic building block of PBKDF2 in constructing the key derivation function.
PBKDF2 applies the PRF many times to the password. This means that an attacker who tries to crack the password needs to apply the function many times.
4. c:- The number of iterations that need to be performed.
The choice of no.of iterations can be varied according to the environmental conditions. Iteration count can be increased to increase the security.
Note:- Increasing the iteration count will not be a significant burden for legitimate parties, but it would be a significant burden for opponent parties.
5. dkLen:- Generated derived key bit length.
The derived key (hashed value) length can be specified according to the requirements.
PBKDF2 as Key Derivation Function (KDF)
A key derivation function is a hash function that derives one or more secret keys from a secret value such as a password using a pseudo-random function. This function introduces CPU-intensive operations and increases the cost of an exhaustive search. This not only slows down brute force attacks but also increases the size of the cryptographic key.
Derived Key (DK) = T1 + T2 +....+ TdkLen/hlendkLen -> Derived key length in bits
hlen -> PRF output length in bits
Here, the derived key is calculated by concatenating sub-keys that were produced by the PRF.
Output length of PRF = Length of one sub-key in bits = hlen
Length of the derived key value (hashed value) = dkLen
Therefore, no.of sub-keys generated = dkLen/hlen
T1 = F(Password, Salt, c, i) = U1 ^ U2 ^....^ Uc
U1 = PRF(Password, Salt+INT_32_BE(i))
U2 = PRF(Password, U1)
U3 = PRF(Password, U2)
.
.
.
Uc = PRF(Password, Uc-1)
U1, U2, …., Uc were performed an XOR operation to produce a sub-key. But each U will be the output of the PRF as shown above.
INT_32_BE(i) -> 32 bit big-endian representation of i
Note:- This will be concatenated with the password in calculating each U1 of each sub-key to differentiate each sub-key.
Pseudo-Random Function (PRF)
- Informally, a pseudo-random function “looks like a random function”. But as it is calculated from the computer, then it cannot be truly random. But can be nearly random.
- HMAC-SHA-1 is the default pseudo-random function used in PBKDF2.
- But, Pseudo-Random Functions can be decided according to the requirements.
- Few examples of Pseudo-Random Function were HMAC-SHA-256, HMAC-SHA-512, etc.
HMAC (Hash-based Message Authentication Code)
Hash-based Message Authentication Code acts as the Pseudo-Random Function in PBKDF2. HMAC is also a kind of hashing algorithm that has slight modifications. For example, HMAC-SHA1 is a kind of hashing algorithm that has SHA1 as the main hashing algorithm with slight modifications.
Similarly, HMAC-SHA256 and HMAC-SHA512 are also having SHA256 and SHA512 as the main hashing algorithm functioning respectively.
Most importantly, HMAC has a great resistance towards crypto attacks since it is using the hashing concept twice. So that, it is more secure than any other authentication code.
Have a look at https://www.geeksforgeeks.org/what-is-hmachash-based-message-authentication-code/ to have a brief understanding of HMAC.
Note:- Even though SHA1 is prone to length extension attacks, HMAC-SHA1 seems still secure because of the modifications it went through.
Analysis on Average Hashing time varies with the configurable parameters
I have analysed how the length of the parameters derived key and iteration count affect the average hashing time. The tests have been done locally. (Depends on my specifications and performance of the PC)
According to the tests,
- Average hashing time 𝝰 Derived key length
- Average hashing time 𝝰 Iteration count
Note:- The security of the hash of the passwords mainly depend on the time it takes to hash a password. So increasing the average hashing time increases the security of the hashed password. But, it is not a good practice to increase the derived key length to increase the hashing time because it takes more memory without a particular requirement. So, to increase the hashing time, it is a good practice to increase the iteration count.
The minimum iteration count recommended by NIST is 1000.
Pros and Cons of PBKDF2
Pros:-
- Hash function PBKDF2 is purposely designed to be slow by a large iteration count which makes the brute force attack on the created key with PBKDF2 much harder.
- The length of the derived key (output hash) can be configurable.
Cons:-
- Since PBKDF2 uses relatively small amounts of RAM and can be efficiently implemented on GPU, it makes it very easy to design custom hardware to significantly speed up password cracking.
Industry Applications of PBKDF2 Password Hashing
- Microsoft Windows Data Protection API (DPAPI)
- Keeper (for password hashing)
- LastPass (for password hashing)
- 1Password (for password hashing)
- Enpass (for password hashing)
- Dashlane (for password hashing)
- Bitwarden (for password hashing)
- Standard Notes (for password hashing)
- Mac OS X Mountain Lion (for user passwords)
- Apple’s iOS mobile operating system (for protecting user passcodes and passwords)
- WinZip (AES Encryption Scheme)
- Django (web framework, as of release 1.4)
The idea for Increasing the performance of PBKDF2
I have researched PBKDF2 for implementing PBKDF2 hashing support for a Server. Selecting the parameters for the PBKDF2 hashing algorithm is the key part of the implementation of the algorithm. The parameters need to be selected in such a way that the security needs to be increased without reducing the user experience below a particular level. So, I have found a simple technique to increase security by maintaining the user experience at the same level. Let’s dive in. (The average hashing time has been calculated under local specifications of the PC)
- Instance 1:- When we use HMAC-SHA1 as the PRF, it has only 160 bits as output length. Suppose an instance where iteration count is 1000 and dklen is 256 bits. Here the average time hashing time to hash a single password is 2.24 ms. Since the dklen is 256 bits and the output of the PRF has 160 bits as the length, there will be 2 subkeys produced. Those 2 subkeys will be concatenated to produce the hashed value. Even though it takes 2.24 ms to compute all the 2 subkeys, 1 subkey is enough to compare the hash value of the password while brute-forcing. So, 1.12ms is enough for an attacker to compute the 1st subkey of the password he has, and compare it with the hash value which needs to be cracked.
- Instance 2:- If we use HMAC-SHA256 as PRF which has an output length of 256 bits for the same instance above. The average hashing time has been 1.7 ms. But here, there would be only one subkey produced since the output length of the PRF and the dklen are equal. So that the attacker should have to spend the whole 1.7 ms to check the password match while brute-forcing.
- As explained above, even though the average hashing time is reduced in instance 2, it is better than instance 1 in the case of brute-forcing.
The configurable parameters of the PBKDF2 hashing algorithm need to be chosen in such a way as to improve the security without affecting the user experience much.