The m-ary Method - Description of Papers - View of Exponentiation, Modular Multiplication and V

1.5 Description of Papers

2.1.1 The m-ary Method

Knuth generalises the left-to-right binary method to a left-to-right m-ary method. For m > 2 the exponent is m-ary encoded as a string of n digits, such thate is expressed as

2.1. FEWER MULTIPLICATIONS 25 which for m = 2 is identical to Equation (2.3). The unary method is also a specialisation of the m-ary method: Let m = 1 and let all digits ei in the unary encoding be 1. This means that the unary encoding is a string of e 1’s. Hence, the following m-ary computation method applies to all m >1:

b^e = b⁽⁽^···^((eⁿ⁻¹^)m+eⁿ⁻²^)m+^···^)m+e¹^)m+e⁰ (2.7)

= ((· · ·((b^eⁿ⁻¹)^m∗b^eⁿ⁻²)^m∗ · · ·)^m∗b^e¹)^m∗b^e⁰.

When m > 2 the exponent digits take values from the set {0,1,2, . . . , m− 1}. Consequently, the powers b^eⁱ are no longer the simple values 1 and b.

Instead of computing b^eⁱ for each i, implying that the same power of b may be recomputed several times, they can be precomputed and stored into a table of size m −1. Stinson [Sti90] describes exponentiation methods for an algebraic system where the time for squaring is negligible compared to multiplication. One of Stinson’s results is an algorithm that computes all powers b, b², b³, . . . , b^m⁻¹ in not more than ¹₂m−1 multiplications, where m is assumed to be a power of two. The algorithm starts from b and proceeds to the next even power by using the rule b^e = (b^e²)². This costs a squaring.

Then the next odd power is computed by the rule b^e = b^e⁻¹∗b at a cost of a multiplication. By alternating application of these two rules the complete table is builded up. If Stinson’s algorithm is applied to general values of m, i.e. values that are not restricted to powers of two, and if the squarings are brought into the analysis a total of m−2 operations is used: ¹₂(m −2) multiplications and ¹₂(m−1) squarings. The unary method also requires a total of m−2 operations to compute the table. However, only one of these operations is a squaring.

After the table is precomputed the remaining computation usesνm(e)−1 multiplications andn−1 = logm eexponentiations with the (small) expo-nent m. The function νm(e)∈ {1,2, . . . ,log_m e+ 1} denotes the number of non-zero digits in the m-ary encoding of e. It is convenient to choose m to be a power of two, i.e. m = 2^k for some k. Then the exponentiations with exponent m can be performed by k = log₂ m squarings. This gives a total of (log₂ m)log_m e+¹₂(m−1)squarings andνm(e)−1 +¹₂(m−2) multiplications. Besides storage for the table a register for the intermediate results is needed, a total of m registers. When m = 2^k the digits ei of an exponent e can be interpreted as groups of k bits in a binary encoding of the exponent. This is the reason that the m-ary method is also named the k-window method. It should be mentioned that an exponentiation method

identical to thek-window method was described by Brauer [Bra39] in 1939.

Brauer developed the method to improve an upper bound on the number of multiplications needed for computing exponentials.

Figure 2.1: The total number of operations for the m-ary method in the worst, average and best case. Also shown is the number of squarings.

Figure 2.1 shows the computing time for diﬀerent sizes of the window k= log₂mwheneis a 512 bit number. The curves are derived from the above expressions, which are based on the assumption that m is a power of two.

So for non-integer window sizes the curves are approximations. The worst case time is for all digitsei non-zero. The best case time is when all but the most signiﬁcant digit are zero. The best case time does not represent a lower bound for the computing time for exponentiation but illustrates the interval of computing time for various exponent values when the m-ary method is applied. Indeed, when only the most signiﬁcant digit is non-zero there is no reason to precompute a table that is never used. Also shown in the ﬁgure is the average time: If the digits are random in the set {0,1, . . . , m−1} the

2.1. FEWER MULTIPLICATIONS 27 log₂ e+ 1 Window size Squarings Multiplications Total Registers

256 4 259 70 329 16

512 5 525 117 642 32

1024 6 1051 201 1252 64

Table 2.1: Minimal, worst case, computing time for m-ary method.

probability for a non-zero digit is ^m_m⁻¹. Recalling that en−1 >0, this implies that νm(e) = ^m_m⁻¹log_me+ 1 in the average case. The number of squaring operations is also depicted in the ﬁgure. If similar curves for 256 bit and 1024 bit exponents are plotted these will be shaped very much like the curves for 512 bit exponents.

In Table 2.1 the minimal total number of operations, in the worst case, is listed for 256, 512 and 1024 bit exponents. With respect to the worst case computing time a window size of 5 is optimal for 512 bit exponents. Com-pared to the binary method (window size 1) the worst case total number of operations is reduced by 37 percents for 512 bit exponents when a window size of 5 is chosen. Although the optimal window size decreases with decreas-ing exponent bit-lengths and increases for increasdecreas-ing exponent bit-lengths a window size of 5 gives a worst case total number of operations that is fairly close to minimum for exponents with bit-lengths from 256 bits to 1024 bits.

In the m-ary method the reduction in computing time time is achieved at the expense of additional registers for holding a table of precomputed values. The main contribution to the reduction is the reduced number of multiplications performed in the last phase of the computation, i.e. in the phase after the precomputation. In this phase the number of multiplications is reduced by approximately a factor ofk = log₂ mfromν(e)−1 toνm(e)−1.

But also the number of squarings performed in this phase is (sometimes) slightly reduced from log₂ e to (log₂ m)log_m e . Since log₂ e equals (log₂ m)logm e +log2 en−1 the reduction in the number of squarings is log₂ en−1 ∈ {0,1, . . . , k −1}. This means that the number of bits, log₂ en−1+ 1, used in the binary encoding of the most signiﬁcant digit en−1 determines the reduction of squarings. Since the value b^eⁿ⁻¹ has been precomputed and stored into a table the savings in squarings corresponds to a reuse of squarings already performed during the precomputation. It is possible to get full advantage of these squarings already performed and always achieve a total reduction of k−1 squarings, independent of the number of

bits used for en−1. The trick is to allocate windows to the binary encoding of exponent e from left to right instead of from right to left. The following illustration shows how windows are allocated from right to left in accordance with Equation 2.6.

The bits denoted byxsymbolise an arbitrary bit value. Since all windows have the same window size k the most signiﬁcant digit en−1 is padded with j zero-bits, so that the most signiﬁcant window has k =j+log₂ en−1+ 1 bits in total. Now, if the windows are allocated from left to right instead and the least signiﬁcant digit e0 is padded with j zero-bits the following picture is obtained en−1. The number of multiplications is unchanged. If Equation (2.10) is used for the computation of exponentials a further reduction of the computing times in Table 2.1 is obtained. The reduction is 3 squarings for 512 bit exponents and 2 squarings for 1024 bit exponents. There is no change in the computing times for 256 bit exponents. This is shown in Table 2.2.

The direction in which the windows are allocated should not be confused with direction in which the exponent digits are scanned. In the above de-scription of the m-ary method the exponent digits are scanned from left to right. Thus the description is still a generalisation of the left-to-right binary method.

2.1. FEWER MULTIPLICATIONS 29 log₂ e+ 1 Window size Squarings Multiplications Total Registers

256 4 259 70 329 16

512 5 522 117 639 32

1024 6 1049 201 1250 64

Table 2.2: Minimal, worst case, computing time for m-ary method when windows are allocated left-to-right.

Knuth writes [Knu81, p. 444] that there is also a less obvious right-to-left m-ary method that takes more registers but only a few more operations compared to the left-to-right method. In fact, it is possible to derive a right-to-left method that requires exactly the same number of registers and operations as the left-to-right method. The strategy follows an idea by Yao [Yao76]. Using an m-ary encoding of e, the eth power of b is written as,

b^e = b^e⁰^m⁰^+e¹^m¹⁺^···^+eⁱ^m⁺^···^+eⁿ⁻¹^mⁿ⁻¹ (2.11)

= (b^m⁰)^e⁰ ∗(b^m¹)^e¹ ∗ · · · ∗(b^mⁱ)^eⁱ ∗ · · · ∗(b^mⁿ⁻¹)^eⁿ⁻¹.

Ifmis a power of two, saym= 2^k, the sequenceb^m⁰, b^m¹, . . . , b^mⁱ, . . . , b^mⁿ⁻¹, can be computed in (n−1)ksquarings. Instead of using registers for a precom-puted table, m−1 registers are used for intermediate results. The registers are initialised to 1 and are denoted c1, c2, . . . , cm−1. After the computation of the ith sequence element, b^mⁱ, exponent digit ei is inspected. Digit ei can take one of the values in {0,1, . . . , m−1}. If ei = j and j > 0 the ith se-quence element is multiplied onto cj. In this way all sequence elements with common exponent digit value j are multiplied onto cj. It can be expressed as

cj = 1∗

i:e_i=j

b^mⁱ , for all j = 1,2, . . . , m−1 (2.12) Since the ﬁrst multiplication onto eachcj is a multiplication onto 1 and there-fore negligible, the number of multiplications required to compute c1, c2, . . . , cm−1 is νm(e)−δ. Here δ denotes the number of diﬀerent digit values from {1,2, . . . , m− 1} represented by the exponent digits, i.e. the number of non-empty index sets {i : ei =j} in (2.12).

The ﬁnal phase of the computation uses an algorithm by Brickell et al.

[BGMW92]. In terms of the intermediate results, the eth of b can be

ex-pressed as

b^e = c^m_m⁻₋¹₁∗c^m_m⁻₋²₂∗ · · · ∗c¹₁ (2.13)

= (cm−1)∗(cm−1 ∗cm−2)∗ · · · ∗(cm−1 ∗cm−2∗cm−3· · · ∗c1).

= c_m₋₁∗c_m₋₂∗ · · · ∗c₁.

This phase of the computation is accomplished by two sweeps through the intermediate results. Start by settingc_m₋₁ =cm−1 and then calculatec_m₋₂ = c_m₋₁∗cm−2, c_m₋₃ =c_m₋₂∗cm−3, . . . , c₁ =c₂∗c1 inδ−1 multiplications. If the valuec_j is stored in registercj no further registers are required. Through the second sweep c_m₋₂ = c_m₋₁ ∗c_m₋₂, c_m₋₃ =c_m₋₂∗cm−3, . . . , c₁ = c₂ ∗c₁ are calculated in m −2 multiplications. Now the value c₁ equals b^e. In total, including a register for computing the sequence valuesb^mⁱ,m registers are required. The total number of operations is (n −1)k squarings plus νm(e)+m−3 multiplications. The only diﬀerence from the left-to-rightm-ary method is how the operations are divided into squarings and multiplications.

The right-to-left m-ary method described above uses a right-to-left win-dow allocation. By using a left-to-right winwin-dow allocation the number of squarings also reduces tolog₂ e−(k−1) for the right-to-leftm-ary method.

In document View of Exponentiation, Modular Multiplication and VLSI Implementation of High-Speed RSA Cryptography (Sider 38-44)