Modular Exponentiation - View of Exponentiation, Modular Multiplication and VLSI Implementation

The exponentiation methods described in the previous sections only assume a single property about the multiplication composition: It must be associative.

In this section the attention will be focussed on a speciﬁc multiplication composition: Modular multiplication as used in the RSA crypto system. It will be discussed how the algebraic properties of modular multiplication and of the exponents used in RSA cryptography can be utilised to improve the computing time for modular exponentiation.

In the RSA crypto system (see Section 1.1.3) the functions used for en-cryption and deen-cryption have the formb^e modm, where b is a block of data that is encrypted or decrypted and the pair (e, m) is a key. Modulus m is a composite of two prime numbers p and q, i.e. m = pq. To be useful as key, the exponent e must obey the restriction gcd(e, ϕ(m)) = 1 where

2.4. MODULAR EXPONENTIATION 49 ϕ(m) = (p−1)(q−1). Furthermore, the primes pand q must be about the same bit-length [Nec92, p. 207]. The bit-length of modulus m and exponent e determines the security of the RSA crypto system. At present, bit-lengths in the range from 512 to 1024 are adequate. Since the number of multiplica-tions needed for performing an exponentiation is about proportional to the exponent bit-length, it has been suggested to use small exponent values. It is possible to use a small public exponent value without compromising the security, but the length of the private exponent must be about the same length as modulus. Knuth [Knu81, pp. 386-389] suggests a public exponent of value 3. However, as observed by Hastad [Has88] small public exponent values may be vulnerable to crypto analytical attacks. So, some precautions should be taken when small public keys are used.

If small public exponents are used, the computing time for exponentiation with public exponents is signiﬁcant shorter than the computing time for ex-ponentiation with private exponents. Quisquater and Couvreur [QC82] show how the composite property of m, and the knowledge of the prime factors p and q, can be utilised to speed up exponentiation with private exponents.

The trick is to apply the Chinese Remainder Theorem for dividing the com-putation of b^e mod m into two faster computations, b^e ^mod(p⁻¹⁾ mod p and b^e ^mod(q⁻¹⁾ mod q. The results of these computations can then be combined to the value of b^e mod m by means of two multiplications. The following formulation of the Chinese Remainder Theorem is found in [Knu81, pp. 268-276], where Knuth discusses modular arithmetic,

Theorem 2.4-1 (Chinese Remainder Theorem) Let m1, m2, . . . , mr be positive integers that are relatively prime in pairs, i.e.,

gcd(mj, mk) = 1 when j =k. (2.26) Let m = m1, m2, . . . , mr, and let a, u, u1, u2, . . . , ur be integers. Then there is exactly one integer u that satisﬁes the conditions

a≤u < a+m, and u≡uj(mod mj) for 1≤j ≤r. (2.27) A proof of Theorem 2.4-1 and some methods for computing a solution u to (2.27) are included in Knuth’s book. To see how the theorem can be applied for computing a solution u =b^e mod pq observe that if up = b^e mod p and

uq=b^e mod q then,

up =b^e modp= (b^e modpq) mod p ≡u(mod p), and

uq =b^e modq= (b^e modpq) mod q ≡u(mod q). (2.28) So, by computing up and uq and combining these, a unique solution u =b^e modpqcan be found. The advantage of splitting the computation ofb^e mod pq into the computations of b^e mod pand b^e mod q becomes clear when the following rewriting is done: According to Fermat’s theorem (e.g. [Knu68, p.

69] b^p⁻¹ ≡ 1 (mod p) whenever p is a prime and b is not a multiple of p.

Hence, if it is assumed thatb modp= 0, up = b^e mod p

≡ b^(p⁻^1)(e ^{div (p}⁻^1))+e^{mod (p}⁻¹⁾ (mod p)

≡ (b^(p⁻¹⁾)^e ^{div (p}⁻¹⁾·b^e^{mod (p}⁻¹⁾(mod p) (2.29)

≡ 1^e ^{div (p}⁻¹⁾·b^e ^{mod (p}⁻¹⁾(mod p)

≡ b^e ^{mod (p}⁻¹⁾(mod p)

If b mod p = 0 Fermat’s theorem cannot be used. However, for this special case the congruence be b^e ≡ b^e^{mod (p}⁻¹⁾ ≡ 0 (mod p) is valid when e mod (p−1) = 0. It should be noted that the congruence is not valid if e > 0, e mod (p−1) = 0 and b mod p = 0; by insertion into the deﬁnition of an exponential in (2.1) it follows that,

b^e≡b^e⁻¹·b ≡0^e⁻¹·0≡0 (mod p), and b^e ^{mod (p}⁻¹⁾ ≡b⁰ ≡1(mod p).

A similar discussion leads touq ≡b^e ^{mod (q}⁻¹⁾ (modq) whenemod (q−1)= 0. The restrictions on the exponent values do not imply diﬃculties for an application of the Chinese Remainder Theorem in the RSA crypto system.

Indeed, since the exponent values are already restricted by the condition gcd(e,(p−1)(q −1)) = 1 it is seen that e mod (p−1) = 0 and e mod (q−1)= 0.

Now, to obtain the solution u = b^e mod pq, the intermediate results up

and uq are combined by the computation,

u = (((up−uq)·q⁻¹) mod p)·q+uq (2.30) where q⁻¹ is a precomputed integer satisfying q·q⁻¹ ≡ 1 (mod p), i.e. q⁻¹ is q’s multiplicative inverse modulo p. According to Fermat’s theorem, q⁻¹

2.4. MODULAR EXPONENTIATION 51 can be computed as q^p⁻² mod p. Equation (2.30) was proposed by Garner in 1959 [Gar59]. Opposed to other ways of computing u from up and uq, Equation (2.30) does not require multiplication modulo pq. This is a nice property when dedicated hardware is build for the computation.

The computing time for the exponentiation method based on the Chinese Remainder Theorem is smaller than the time for previous described meth-ods. First, since the length of exponent e mod (p−1) ande mod (q−1) is approximate half the length of e, the number of modular multiplications for computing each of up and uq is halved. Ifup anduq are computed in parallel the time for computing both values will be reduced by a factor two due to the smaller exponent values. Second, the lengths of moduli p and q are half the length of m. According to Chapter 3 the computing time for modular mul-tiplication is approximate proportional the operand length when dedicated hardware is used for the computation. This implies a reduction in computing time by a factor of two. If a standard micro-processor is used the computing time for multiplication is proportional to the square of the operand length [SV93], i.e. a reduction by a factor of four can be expected. The overhead for combining up and uq is a multiplication module p, an ordinary integer multiplication, a subtraction and an addition. This overhead is negligible when p and q have lengths of more than hundred bits. The values e mod (p−1),e mod (q−1) andq⁻¹ are also needed in the computation. However, these values can be thought of being a part of the private key, and the values can be precomputed once for all when the private key is generated. In total, the computing time can be reduced by a factor of four by using dedicated hardware to compute up and uq in parallel. Such a dedicated hardware cir-cuitry will contain two units for performing modular multiplication. But, since each unit operates on word sizes that are half the length of m = pq, the circuit size is expected to be equal to the size of a single unit capable of performing multiplication modulo m. If a single standard micro-processor is used the reduction factor is four, and if two processors are used for a parallel computation of up and uq the reduction factor is eight.

It has been mentioned that dedicated hardware do not need a modulo m multiplication unit. This is true when private exponents are used. If a public exponent is used, the user is not expected to know the prime factori-sation of m. Hence, in this case the Chinese Remainder Theorem cannot be applied, and a modulo m multiplication unit is needed. If a single dedicated hardware device shall be able to perform exponentiations with both private

and public exponents, and be able to explore a parallel computation by the Chinese Remainder Theorem, it must be reconﬁgurable: Two “half-length”

modular multiplication units should be conﬁgured into a single “full-length”

modular multiplication unit. Of course, it would also suﬃce to include both kinds of modular multiplication units, but this would imply a large hardware consumption. Shand and Vuillemin describe a highly conﬁgurable implemen-tation that is based onﬁeld programmable gate arrays in [SV93].

In document View of Exponentiation, Modular Multiplication and VLSI Implementation of High-Speed RSA Cryptography (Sider 62-66)