Curve4QUC Berkeleywatsonbladd@gmail.comMicrosoft Researchplonga@microsoft.comMozillarlb@ipv.sx
Security
Internet-DraftThis document specifies Curve4Q, a twisted Edwards curve proposed in
that takes advantage of arithmetic over the field GF(2^127-1) and two endomorphisms
to achieve the speediest Diffie-Hellman key agreements over a group of order
approximately 2^246, which provides around 128 bits of security.
Curve4Q implementations are more than two times faster than those of Curve25519
and, when not using endomorphisms, are between 1.2 and 1.6 times faster.Public key cryptography continues to be computationally expensive, particularly
on less powerful devices. While recent advances in efficient formulas for
addition and doubling have substantially reduced the cost of elliptic curve
operations in terms of field operations, the number of group operations involved
in scalar multiplication has not been reduced in the curves considered for IETF
use. Using curves with efficiently computable endomorphisms can reduce the
number of group operations by turning one long scalar multiplication into the
sum of several multiplications by smaller scalars, which can be evaluated more
efficiently.For curves over quadratic extension fieldss, there are more endomorphism
families to choose from, and the field operations are often more efficient
compared to prime fields of the same size. The ideal case is given by curves
equipped with two distinct endomorphisms, so that it becomes possible to divide
scalars into four parts. We focus on curves defined over the field GF(p^2)
for the Mersenne prime p = 2^127 - 1, which offers extremely efficient
arithmetic. Together, these improvements substantially reduce computation
time compared to other proposed Diffie-Hellman key exchange and digital
signature schemes. However, the combined availability of these features
severely restricts the curves that can be used for cryptographic applications.As described in , the elliptic curve “Curve4Q” defined in this
document is a special instance of the recent endomorphism-based constructions
from and , and is the only known elliptic curve that
(1) permits a four dimensional decomposition (using two endomorphisms) over
GF(p^2) and (2) has a large prime order subgroup. The order of this subgroup
is approximately 2^246, which provides around 128 bits of security. No other
known elliptic curve with such a decomposition has a larger prime order subgroup
over this field. This “uniqueness” allays concerns about selecting curves
vulnerable to undisclosed attacks.Curve4Q can be used to implement Diffie-Hellman key exchange, as described
below. It is also possible to use Curve4Q as the basis for digital signature
scheme (e.g., ).Curve4Q is defined over the finite field GF(p^2), where p is the Mersenne prime
2^127 - 1. Elements of this finite field have the form (a + b * i), where a and
b are elements of the finite field GF(p) (i.e., integers mod p) and i^2 = -1.Let A = a0 + a1*i and B = b0 + b1*i be two elements of GF(p^2). Below we
present formulas for computing addition, subtraction, multiplication, squaring,
conjugation and inversion.The GF(p) division in the formula for 1/A can be computed using an exponentiation
via Fermat’s little theorem: 1/a = a^(p - 2) = a^(2^127 - 3) for any element a of GF(p).
One can use a fixed addition chain to compute a^(2^127 - 3) (e.g., see ).Curve4Q is the twisted Edwards curve E over GF(p^2) defined by the
following curve equation:Let E(GF(p^2)) be the set of pairs (x, y) of elements of GF(p^2) satisfying this
equation. This set forms a group with the addition operation (x1, y1) + (x2, y2)
= (x3, y3), where:As d is not a square in GF(p^2), and -1 is, this formula never involves a
division by zero when applied to points on the curve. That is, the formula is
complete and works without exceptions for any input in E(GF(p^2)). The
identity element is (0, 1), and the inverse of (x, y) is (-x, y). The order of
this group is #E = 2^3 · 7^2 · N, where N is the following 246-bit prime:Points P on E such that [N]*P = (0, 1) are N-torsion points. Given a point P and
Q which are both N-torsion points, it is difficult to find m such that Q = [m]*P.
This is the elliptic curve discrete logarithm problem, which is closely related
to the security of Diffie-Hellman key exchanges as the best known attacks on the
Diffie-Hellman problem involve solving the discrete logarithm problem. The best
known algorithms take approximately 2^123 group operations.This group has two different efficiently computable endomorphisms, as described
in . As discussed in and , these endomorphisms
allow a multiplication by a large scalar to be computed using multiple multiplications
by smaller scalars, which can be evaluated in much less time overall.Elements a in GF(p) are represented as 16 byte little endian integers which are
the numbers in the range [0, p). The 16 bytes b[0], b[1], … b[15] represent
b[0] + 256*b[1] + 256^2*b[2] + … + 256^15*b[15]. Since we are representing numbers
in the range [0, 2^127-1), the top bit of b[15] is always zero.An element x0 + x1*i of GF(p^2) is represented on the wire by the concatenation
of the encodings for x0 and x1. A point (x, y) on Curve4Q is serialized in a
compressed form as the representation of y with a modified top bit. This top bit
is used to disambiguate between x and -x during decoding.To carry out this disambiguation we use the lexicographic order of elements
in GF(p^2): define two elements a = a0 + a1*i and b = b0 + b1*i with all their
coordinates in [0, p); a is greater than b if a0 is greater than b0. If a0 and
b0 are equal, a is greater than b if a1 is greater than b1.Set the coordinate value x and its negative -x. The top bit of a compressed
point is 0 if x is smaller than -x. Otherwise, the top bit is 1.To decode an encoded point from a 32-byte sequence B:Parse out the encoded values y = y0 + y1 * i and sCheck that y0 and y1 are both less than pSolve x^2 = (y^2 - 1) * (d * y^2 + 1) for xIf s is 0, return the smaller of x and -x (in the lexicographic ordering)If s is 1, return the larger of x and -xCheck that (x,y) is a valid point on the curveThe appendix details an algorithm for decoding a point
following the steps above.We call the operation of compressing a point P into 32 bytes Compress(P),
and decompression Expand(S). Expand(Compress(P))=P for all the points P on the curve,
and Compress(Expand(S))=S if and only if S is a valid representation of a point.Not all 32 byte strings represent valid points. Implementations MUST reject
invalid strings and check that decompression is successful. Strings are invalid
if they are not possible outputs of the compression operator. In particular the
values of y0 and y1 MUST be less then p.Below, we present two algorithms for scalar multiplication on Curve4Q. Each
algorithm takes as input a 256-bit unsigned integer m and an N-torsion point P
and computes the product [m]*P.The first algorithm uses a simple fixed-window exponentiation without exploiting
endomorphisms. The second algorithm uses endomorphisms to accelerate
computation. The execution of operations in both algorithms has a regular
pattern in order to enable constant-time implementations and protect against
timing and simple side channel attacks. Both algorithms use the same addition
and doubling formulas.First, we discuss explicit formulas and efficient projective coordinate
representations.We use coordinates based on extended twisted Edwards coordinates introduced in
: the tuple (X, Y, Z, T) with Z nonzero and Z * T = X * Y
corresponds to a point (x, y) satisfying x = X/Z and y = Y/Z. The neutral point
in this representation is (0, 1, 1, 0). The following slight variants are used in
the optimized scalar multiplication algorithm in order to save computations:
point representation R1 is given by (X, Y, Z, Ta, Tb), where T=Ta * Tb;
representation R2 is (N, D, E, F) = (X + Y, Y- X, 2Z, 2dT); representation R3 is (N,
D, Z, T) = (X + Y, Y - X, Z, T); and representation R4 is (X, Y, Z). Similar “caching”
techniques were discussed in to accelerate repeated
additions of the same point. Converting between these representations is
straightforward.R1: (X, Y, Z, Ta, Tb), Ta * Tb = T, Z * T = X * YR2: (N, D, E, F) = (X + Y, Y - X, 2 * Z, 2 * d * T)R3: (N, D, Z, T) = (X + Y, Y - X, Z, T)R4: (X, Y, Z)A point doubling (DBL) takes an R4 point and produces an R1 point. For addition,
we first define an operation ADD_core that takes an R2 and an R3 point and
produces an R1 point. This can be used to implement an operation ADD which takes
an R1 and an R2 point as inputs (and produces an R1 point) by first converting the
R1 point to R3, and then executing ADD_core. Exposing these operations and the
multiple representations helps save time by avoiding redundant computations: the
conversion of the first argument to ADD can be done once if the argument will be
used in multiple additions.Below, we list the explicit formulas for the required point operations. These
formulas, which are adapted from and , are
complete: they have no exceptional cases, and therefore can be used in any
algorithm for computing scalar multiples without worrying about exceptional
procedure attacks . Note that we do not explicitly note the point
format every time an addition or doubling is used, and assume that conversions
are done when required.DBL and ADD_core are computed as follows:We begin by taking our input point P, and computing a table of points containing
T[0] = [1]P, T[1] = [3]P, … , T[7] = [15]P as follows:Next, take m and reduce it modulo N. Then, add N if necessary to ensure that m
is odd. At this point, we recode m into a signed digit representation consisting
of 63 signed, odd digits d[i] in base 16. The following algorithm accomplishes
this task.Finally, the computation of the multiplication is as follows.As sign is either -1 or 1, the multiplication sign * T[ind] is simply a
conditional negation. To negate a point (N, D, E, F) in R2 form one computes
(D, N, E, -F). The table lookups and conditional negations must be carefully
implemented as described in ``Security Considerations’’ to avoid side-channel
attacks. This algorithm MUST NOT be applied to points which are not N-torsion
points; it will produce the wrong answer.This algorithm makes use of the identity [m]*P = [a_1]*P + [a_2]*phi(P) +
[a_3]*psi(P) + [a_4]*psi(phi(P)), where a_1, a_2, a_3, and a_4 are 64-bit
scalars that depend on m. The overall product can then can be computed using a
small table of 8 precomputed points and 64 doublings and additions. This is
considerably fewer operations than the number of operations required by the
algorithm above, at the cost of a more complicated implementation.We describe each phase of the computation separately: the computation of the
endomorphisms, the scalar decomposition and recoding, the creation of the table
of precomputed points and, lastly, the computation of the final results. Each
section refers to constants listed in an appendix in order of appearance.The two endomorphisms phi and psi used to accelerate multiplication are computed
as phi(Q) = tau_dual(upsilon(tau(Q)) and psi(Q) = tau_dual(chi(tau(Q))). Below,
we present procedures for tau, tau_dual, upsilon and chi, adapted from
. Tau_dual produces an R1 point, while the other procedures produce
R4 points.Note: Tau produces points on a different curve, while upsilon and chi are
endomorphisms of that different curve. Tau and tau_dual are the isogenies
mentioned in the mathematical background above. As a result the intermediate
results do not satisfy the equations of the curve E. Implementers who wish to
check the correctness of these intermediate results are referred to .This stage has two parts. The first one consists in decomposing the scalar into
four 64-bit integers, and the second one consists in recoding these integers
into a form that can be used to efficiently and securely compute the scalar
multiplication.The decomposition step uses four fixed vectors called b1, b2, b3, b4, with four
64 bit entries each. In addition, we have integer constants L1, L2, L3, L4,
which are used to implement rounding. All these values are listed in
. In addition, we use two constant vectors derived from these
inputs:c = 5 * b2 - 3 * b3 + 2 * b4c’ = 5 * b2 - 3 * b3 + 3 * b4 = c + b4Given m, first compute t[i] = floor(L[i] * m / 2^256) for i between 1 and 4.
Then compute the vector sum a = (m, 0, 0, 0) - t1 b1 - t2 b2 - t3 b3 - t4 b4.
Precisely one of a + c and a + c’ has an odd first coordinate: this is the
vector v that is fed into the scalar recoding step. Note that the entries of
this vector are 64 bits, so intermediate values in the calculation above can be
truncated to this width.The recoding step takes the vector v=(v1, v2, v3, v4) from the previous step and
outputs two arrays m[0]..m[64] and d[0]..d[64]. Each entry of d is between 0 and
7, and each entry in m is -1 or 0. The recoding algorithm is detailed below.
bit(x, n) denotes the nth bit of x, counting from least significant to most,
starting with 0.We now describe the last step in the endomorphism based algorithm for computing
scalar multiplication. On inputs m and P, the algorithm first precomputes a
table of images of P under the endomorphisms, then recodes m, then uses these
intermediate artifacts to compute the scalar product.First, compute a table T of 8 points in representation R2 as shown below.
Computations Q = psi(P), R = phi(P) and S = psi(phi(P)) are carried out using
formulas from .Second, apply the scalar decomposition and recoding algorithm from
to m, to produce the two arrays
m[0]..m[64] and d[0]..d[64].Define s[i] to be 1 if m[i] is 1 and -1 if m[i] is 0. Then the multiplication
is completed as follows:Multiplication by s[i] is simply a conditional negation. To negate an R2 point
(N, D, E, F) one computes (D, N, E , -F). It is important to do this (as well as
the table lookup) in constant time, i.e., the execution of branches and memory
accesses MUST NOT depend on secret values (see ``Security Considerations’’ for
more details).The optimized multiplication algorithm above only works properly for N-torsion
points. Implementations MUST NOT use this algorithm on anything that is not
known to be an N-torsion point. Otherwise, it will produce the wrong answer,
with extremely negative consequences for security.The above scalar multiplication algorithms can be used to implement
Diffie-Hellman with cofactor.The role of the multiplication by 392 is to ensure that Q is an N-torsion point
so that the scalar multiplication [m]*P in the DH function above may be used
safely to produce correct results. In other words, as the cofactor is greater
than one, Diffie-Hellman computations using Curve4Q MUST always clear the cofactor
(i.e., multiply by 392, as explained above).The base point G for Diffie-Hellman operations has the following affine
coordinates:The tables used in multiplications of this generator (small multiples of G for
the multiplication without endomorphisms, or endomorphism images for the optimized
multiplication with endomorphisms) can be pre-generated to speed up the first,
fixed-point DH computation.Two users, Alice and Bob, can carry out the following steps to derive a shared
key: each picks a random string of 32 bytes, mA and mB, respectively. Alice
computes the public key A = Compress([mA]*G), and Bob computes the public key
B = Compress([mB]*G). They exchange A and B, and then Alice computes KAB =
DH(mA, Expand(B)) while Bob computes KBA = DH(mB, Expand(A)), which produces the
shared point K = KAB = KBA. The y coordinate of K, represented as a 32 byte
string as detailed in is the shared
secret.Before decompressing A and B using the function Expand(), each user SHOULD verify
that the 128th bit of the received public key is zero (i.e., the non-imaginary
part of the corresponding y-coordinate should be < 2^127).If the received strings are not valid points, the DH function has failed to
compute an answer. Implementations SHOULD return a random 32 byte string as well
as return an error, to prevent bugs when applications ignore return codes. They
MUST signal an error when decompression fails.Implementations MAY use any method to carry out these calculations, provided
that it agrees with the above function on all inputs and failure cases, and does
not leak information about secret keys. For example, refer to the constant-time
fixed-base scalar multiplication algorithm implemented in to
accelerate the computation of multiplications by the generator G.[RFC Editor: please remove this section prior to publication]
This document has no IANA actions.The best known algorithms for the computation of discrete logarithms on Curve4Q
are parallel versions of the Pollard rho algorithm in . On
Curve4Q these attacks take on the order of 2^123 group operations to compute a
single discrete logarithm. The additional endomorphisms have large order, and so
cannot be used to accelerate generic attacks. Quadratic fields are not affected
by any of the index calculus attacks used over larger extension fields.Implementations MUST check that input points properly decompress to points on
the curve. Removing such checks may result in extremely effective attacks. The
curve is not twist-secure: implementations using single coordinate ladders MUST
validate points before operating on them. In the case of protocols that require
contributory behavior, when the identity is the output of the DH primitive it
MUST be rejected and failure signaled to higher levels. Notoriously
without is such a protocol.Implementations MUST ensure that execution of branches and memory addresses
accessed do not depend on secret data. The time variability introduced by
secret-dependent operations have been exploited in the past via timing and cache
attacks to break implementations. Side-channel analysis is a constantly moving
field, and implementers must be extremely careful to ensure that operations do
not leak any secret information. Using ephemeral private scalars for each
operation (ideally, limiting the use of each private scalar to one single
operation) can reduce the impact of side-channel attacks. However, this might
not be possible for many applications of Diffie-Hellman key agreement.In the future quantum computers may render the discrete logarithm problem easy
on all abelian groups through Shor’s algorithm. Data intended to remain
confidential for significantly extended periods of time SHOULD NOT be protected
with any primitive based on the hardness of factoring or the discrete log
problem (elliptic curve or finite field).FourQ: four-dimensional decompositions on a Q-curve over the Mersenne primeThe Q-curve Construction for Endomorphism-Accelerated Elliptic CurvesFour-dimensional GLV via the Weil restrictionParallel Collision Search with Cryptanalytic ApplicationsExceptional procedure attack on elliptic curve cryptosystemsFast and compact elliptic-curve cryptographyFourQlibFaster Point Multiplication on Elliptic Curves with Efficient EndomorphismsEndomorphisms for Faster Elliptic Curve Cryptography on a Large Class of CurvesSchnorrQ: Schnorr Signatures on FourQTwisted Edwards Curves RevisitedTwisted Edwards CurvesThe Transport Layer Security (TLS) Protocol Version 1.2This document specifies Version 1.2 of the Transport Layer Security (TLS) protocol. The TLS protocol provides communications security over the Internet. The protocol allows client/server applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery. [STANDARDS-TRACK]Transport Layer Security (TLS) Session Hash and Extended Master Secret ExtensionThe Transport Layer Security (TLS) master secret is not cryptographically bound to important session parameters such as the server certificate. Consequently, it is possible for an active attacker to set up two sessions, one with a client and another with a server, such that the master secrets on the two sessions are the same. Thereafter, any mechanism that relies on the master secret for authentication, including session resumption, becomes vulnerable to a man-in-the-middle attack, where the attacker can simply forward messages back and forth between the client and server. This specification defines a TLS extension that contextually binds the master secret to a log of the full handshake that computes it, thus preventing such attacks.The following algorithm is an adaptation of the decompression algorithm
from . It decodes a 32-byte string B which is formatted as
detailed in . The result is a valid
point P = (x, y) that satisfies the curve equation, or a message of FAILED
if the decoding had a failure.