]>
The memory-hard Argon2 password hash and proof-of-work function
University of Luxembourgalex.biryukov@uni.luUniversity of Luxembourgdumitru-daniel.dinu@uni.luUniversity of Luxembourgdmitry.khovratovich@uni.luSJD ABsimon@josefsson.orghttp://josefsson.org/This document describes the Argon2 memory-hard function for
password hashing and proof-of-work applications. We provide an
implementer oriented description together with sample code and
test vectors. The purpose is to simplify adoption of Argon2 for
Internet protocols.This document describes the Argon2 memory-hard function for
password hashing and proof-of-work applications. We provide an
implementer oriented description together with sample code and
test vectors. The purpose is to simplify adoption of Argon2 for
Internet protocols. This document corresponds to version 1.3 of the Argon2 hash
function.Argon2 summarizes the state of the art in the design of
memory-hard functions. It is a streamlined and simple design.
It aims at the highest memory filling rate and effective use of
multiple computing units, while still providing defense against
tradeoff attacks. Argon2 is optimized for the x86 architecture
and exploits the cache and memory organization of the recent
Intel and AMD processors. Argon2 has one primary variant: Argon2id, and two supplementary variants: Argon2d and
Argon2i. Argon2d uses data-depending memory
access, which makes it suitable for cryptocurrencies and
proof-of-work applications with no threats from side-channel
timing attacks. Argon2i uses data-independent memory access,
which is preferred for password hashing and password-based key
derivation. Argon2id works as Argon2i for the first half of the first iteration over the
memory, and as Argon2d for the rest, thus providing both side-channel attack protection and
brute-force cost savings due to time-memory tradeoffs. Argon2i makes more passes over the
memory to protect from tradeoff attacks.For further background and discussion, see the Argon2 paper.x**y --- x multiplied by itself y timesa*b --- multiplication of a and bc-d --- substraction of c with dE_f --- variable E with subscript index fg / h --- g divided by hI(j) --- function I evaluated on parameter jK || L --- string K concatenated with string La ^ b --- bitwise exclusive-or between a and ba mod b --- remainder of a modulo b, always in range [0, b-1]a >>> n --- rotation of a to the right by n bitstrunc(a) --- the 64-bit value a truncated to the 32 least significant
bitsextract(a, i) --- the i-th set of 32-bits from a|A| --- the number of elements in set AArgon2 has the following input parameters:
Message string P, which is a password for password hashing
applications. May have any length from 0 to 2**32 - 1 bytes.Nonce S, which is a salt for password hashing applications.
May have any length from 8 to 2**32-1 bytes. 16 bytes is recommended for
password hashing. Salt must be unique for each password.Degree of parallelism p determines how many independent
(but synchronizing) computational chains (lanes) can be
run. It may take any integer value from 1 to 2**24-1.Tag length T may be any integer number of bytes from 4 to
2**32-1.Memory size m can be any integer number of kibibytes from
8*p to 2**32-1. The actual number of blocks is m', which is
m rounded down to the nearest multiple of 4*p.Number of iterations t (used to tune the running time
independently of the memory size) can be any integer number
from 1 to 2**32-1.Version number v is one byte 0x13.Secret value K (serves as key if necessary, but we do not
assume any key use by default) may have any length from 0 to
2**32-1 bytes.Associated data X may have any length from 0 to 2**32-1
bytes.Type y of Argon2: 0 for Argon2d, 1 for Argon2i, 2 for Argon2id.The Argon2 output is a T-length string.Argon2 uses an internal compression function G with two
1024-byte inputs and a 1024-byte output, and an internal hash
function H. Here H is the BLAKE2b hash function, and
the compression function G is based on its internal
permutation. A variable-length hash function H' built upon H
is also used. G and H' are described in later section.The Argon2 operation is as follows.
Establish H_0 as the 64-bit value as shown in the figure
below. H is BLAKE2b and the non-strings p, T, m, t, v, y,
length(P), length(S), length(K), and length(X) are treated
as a 32-bit little-endian encoding of the integer.
Allocate the memory as m' 1024-byte blocks where m' is
derived as:
For p lanes, the memory is
organized in a matrix B[i][j] of blocks with p rows (lanes)
and q = m' / p columns.Compute B[i][0] for all i ranging from (and including) 0
to (not including) p.
Here integers are padded to 4 bytes and encoded in little endian.Compute B[i][1] for all i ranging from (and including) 0
to (not including) p.
Here integers are padded to 4 bytes and encoded in little endian.Compute B[i][j] for all i ranging from (and including) 0
to (not including) p, and for all j ranging from (and
including) 2 to (not including) q. The block indices i'
and j' are determined differently for Argon2d, Argon2i, and Argon2id.
If the number of iterations t is larger than 1, we repeat
the steps however replacing the computations with the
following expression:
After t steps have been iterated, the final block C is computed as
the XOR of the last column:
The output tag is computed as H'(C).Let H_x be a hash function with x-byte output (in our case
H_x is BLAKE2b, which supports x between 1 and 64 inclusive).
Let V_i be a 64-byte block, and A_i be its first 32 bytes, and
T < 2**32 be the tag length in bytes, encoded in little-endian
as 32-bit integer. Then we define:
To enable parallel block computation, we further partition the
memory matrix into S = 4 vertical slices. The intersection of a
slice and a lane is a segment of length q/S. Segments of the
same slice are computed in parallel and may not reference blocks
from each other. All other blocks can be referenced.J_1 is given by the first 32 bits of block B[i][j-1],
while J_2 is given by the next 32-bits of block B[i][j-1]:
Each application of the 2-round compression function G
in the counter mode gives 128 64-bit values J_1 || J_2.
The first input is the all zero block and the second
input is constructed as follows:
The values r, l, s, m', t, x, i are represented on 8 bytes in
little-endian.If the pass number is 0 and the slice number is 0 or 1, then compute J_1 and J_2 as
for Argon2i, else compute J_1 and J_2 as for Argon2d.The value of l = J_2 mod p gives the index of the lane from
which the block will be taken. For the first pass (r=0) and
the first slice (s=0) the block is taken from the current lane.The set R contains the indices that can be referenced
according to the following rules:
If l is the current lane, then R includes the indices of
all blocks in the last S - 1 = 3 segments computed and finished, as well as
the blocks computed in the current segment in the current pass
excluding B[i][j-1].If l is not the current lane, then R includes the indices of
all blocks in the last S - 1 = 3 segments computed and finished
in lane l. If B[i][j] is the first block of a segment, then the
very last index from R is excluded.We are going to take a block from R with a non-uniform
distribution over [0, |R|):
To avoid floating point computation, the following approximation
is used:
The value of z gives the reference block index in R.Compression function G is built upon the BLAKE2b round
function P. P operates on the 128-byte input, which can be
viewed as 8 16-byte registers:
Compression function G(X, Y) operates on two 1024-byte
blocks X and Y. It first computes R = X XOR Y. Then R is
viewed as a 8x8 matrix of 16-byte registers R_0, R_1, ... ,
R_63. Then P is first applied rowwise, and then columnwise to
get Z:
Finally, G outputs Z XOR R:
Permutation P is based on the round function of BLAKE2b. The 8
16-byte inputs S_0, S_1, ... , S_7 are viewed as a 4x4 matrix of
64-bit words, where S_i = (v_{2*i+1} || v_{2*i}):
It works as follows:
G(a, b, c, d) is defined as follows:
The modular additions in G are combined with 64-bit multiplications.
Multiplications are the only difference to the original BLAKE2b design.
This choice is done to increase the circuit depth and thus the running
time of ASIC implementations, while having roughly the same running
time on CPUs thanks to parallelism and pipelining.
Argon2d is optimized for settings where the adversary does
not get regular access to system memory or CPU, i.e. he can not
run side-channel attacks based on the timing information, nor he
can recover the password much faster using garbage
collection. These settings are more typical for backend servers
and cryptocurrency minings. For practice we suggest the
following settings:
Cryptocurrency mining, that takes 0.1 seconds on a 2 Ghz
CPU using 1 core — Argon2d with 2 lanes and 250 MB of RAM.Argon2id is optimized for more realistic settings, where the
adversary possibly can access the same machine, use its CPU or
mount cold-boot attacks. We suggest the following
settings:
Backend server authentication, that takes 0.5 seconds on a
2 GHz CPU using 4 cores — Argon2id with 8 lanes and 4 GB of
RAM.Key derivation for hard-drive encryption, that takes 3
seconds on a 2 GHz CPU using 2 cores - Argon2id with 4 lanes
and 6 GB of RAM.Frontend server authentication, that takes 0.5 seconds on a
2 GHz CPU using 2 cores - Argon2id with 4 lanes and 1 GB of
RAM.We recommend the following procedure to select the type and
the parameters for practical use of Argon2.
Select the type y. If you do not know the difference
between them or you consider side-channel attacks as viable
threat, choose Argon2id.Figure out the maximum number h of threads that can be
initiated by each call to Argon2.Figure out the maximum amount m of memory that each call
can afford.Figure out the maximum amount x of time (in seconds) that
each call can afford.Select the salt length. 128 bits is sufficient for all
applications, but can be reduced to 64 bits in the case of
space constraints.Select the tag length. 128 bits is sufficient for most
applications, including key derivation. If longer keys are
needed, select longer tags.If side-channel attacks is a viable threat, enable the
memory wiping option in the library call.Run the scheme of type y, memory m and h lanes and threads,
using different number of passes t. Figure out the maximum t
such that the running time does not exceed x. If it exceeds x
even for t = 1, reduce m accordingly.Hash all the passwords with the just determined values m,
h, and t.This section contains test vectors for Argon2.TBANone.The collision and preimage resistance levels of Argon2 are equivalent to those of the underlying Blake2b hash function.
To produce a collision, 2**256 inputs are needed. To find a preimage, 2**512 inputs must be tried.The KDF security is determined by the key length
and the size of the internal state of hash function H'.
To distinguish the output of keyed Argon2 from random, minimum of (2**128,2**length(K)) calls to Blake2b is needed. Time-space tradeoffs allow computing a memory-hard function storing fewer memory blocks at the cost of more calls to
the internal comression function. The advantage of tradeoff attacks is measured in the reduction factor to the time-area
product, where memory and extra compression function cores contribute to the area, and time is increased to accomodate the recomputation
of missed blocks. A high reduction factor may potentially speed up preimage search.
The best attacks on the 1-pass and 2-pass Argon2i is the low-storage
attack described in , which reduces the
time-area product (using the peak memory value) by the factor of 5.
The best attack on 3-pass and more Argon2i is with reduction factor being a function of
memory size and the number of passes. For 1 GiB of memory: 3 for 3 passes, 2.5 for 4 passes, 2 for 6 passes. The reduction
factor grows by about 0.5 with every doubling the memory size.
To completely prevent time-space tradeoffs from ,
number t of passes must exceed binary logarithm of memory minus 26.
The best tradeoff attack on t-pass Argon2d is the ranking tradeoff attack,
which reduces the time-area product by the factor of 1.33.
The best tradeoff attack on 1-pass Argon2id is the combined low-storage attack (for the first half of the memory) and
the ranking attack (for the second half), which bring together the factor of about 2.1. The best tradeoff attack on
t-pass Argon2d is the ranking tradeoff attack,
which reduces the time-area product by the factor of 1.33.
A bottleneck in a system employing the password-hashing function
is often the function latency rather than memory costs. A rational
defender would then maximize the bruteforce costs for the attacker equipped
with a list of hashes, salts, and timing information, for fixed computing time
on the defender’s machine. The attack cost estimates from
imply that for Argon2i 3 passes is almost optimal for the most of reasonable memory sizes,
and that for Argon2d and Argon2id 1 pass maximizes the attack costs for the constant defender time.
The Argon2id variant with t=1 and maximum available memory is recommended
as a default setting for all environments. This setting is secure against side-channel attacks
and maximizes adversarial costs on dedicated bruteforce hardware.
&BLAKE2;
Argon2: the memory-hard function for password hashing
and other applicationsBalloon Hashing: Provably Space-Hard Hash Functions with
Data-Independent Access PatternsEfficiently Computing Data-Independent Memory-Hard FunctionsTradeoff Cryptanalysis of Memory-Hard Functions