• Ingen resultater fundet

1.5 Description of Papers

2.1.1 The m-ary Method

Knuth generalises the left-to-right binary method to a left-to-right m-ary method. For m > 2 the exponent is m-ary encoded as a string of n digits, such thate is expressed as

2.1. FEWER MULTIPLICATIONS 25 which for m = 2 is identical to Equation (2.3). The unary method is also a specialisation of the m-ary method: Let m = 1 and let all digits ei in the unary encoding be 1. This means that the unary encoding is a string of e 1’s. Hence, the following m-ary computation method applies to all m >1:

be = b((···((en−1)m+en−2)m+···)m+e1)m+e0 (2.7)

= ((· · ·((ben−1)m∗ben−2)m∗ · · ·)m∗be1)m∗be0.

When m > 2 the exponent digits take values from the set {0,1,2, . . . , m 1}. Consequently, the powers bei are no longer the simple values 1 and b.

Instead of computing bei for each i, implying that the same power of b may be recomputed several times, they can be precomputed and stored into a table of size m 1. Stinson [Sti90] describes exponentiation methods for an algebraic system where the time for squaring is negligible compared to multiplication. One of Stinson’s results is an algorithm that computes all powers b, b2, b3, . . . , bm1 in not more than 12m−1 multiplications, where m is assumed to be a power of two. The algorithm starts from b and proceeds to the next even power by using the rule be = (be2)2. This costs a squaring.

Then the next odd power is computed by the rule be = be1∗b at a cost of a multiplication. By alternating application of these two rules the complete table is builded up. If Stinson’s algorithm is applied to general values of m, i.e. values that are not restricted to powers of two, and if the squarings are brought into the analysis a total of m−2 operations is used: 12(m 2) multiplications and 12(m1) squarings. The unary method also requires a total of m−2 operations to compute the table. However, only one of these operations is a squaring.

After the table is precomputed the remaining computation usesνm(e)1 multiplications andn−1 = logm eexponentiations with the (small) expo-nent m. The function νm(e)∈ {1,2, . . . ,logm e+ 1} denotes the number of non-zero digits in the m-ary encoding of e. It is convenient to choose m to be a power of two, i.e. m = 2k for some k. Then the exponentiations with exponent m can be performed by k = log2 m squarings. This gives a total of (log2 m)logm e+12(m1)squarings andνm(e)1 +12(m2) multiplications. Besides storage for the table a register for the intermediate results is needed, a total of m registers. When m = 2k the digits ei of an exponent e can be interpreted as groups of k bits in a binary encoding of the exponent. This is the reason that the m-ary method is also named the k-window method. It should be mentioned that an exponentiation method

identical to thek-window method was described by Brauer [Bra39] in 1939.

Brauer developed the method to improve an upper bound on the number of multiplications needed for computing exponentials.

Figure 2.1: The total number of operations for the m-ary method in the worst, average and best case. Also shown is the number of squarings.

Figure 2.1 shows the computing time for different sizes of the window k= log2mwheneis a 512 bit number. The curves are derived from the above expressions, which are based on the assumption that m is a power of two.

So for non-integer window sizes the curves are approximations. The worst case time is for all digitsei non-zero. The best case time is when all but the most significant digit are zero. The best case time does not represent a lower bound for the computing time for exponentiation but illustrates the interval of computing time for various exponent values when the m-ary method is applied. Indeed, when only the most significant digit is non-zero there is no reason to precompute a table that is never used. Also shown in the figure is the average time: If the digits are random in the set {0,1, . . . , m1} the

2.1. FEWER MULTIPLICATIONS 27 log2 e+ 1 Window size Squarings Multiplications Total Registers

256 4 259 70 329 16

512 5 525 117 642 32

1024 6 1051 201 1252 64

Table 2.1: Minimal, worst case, computing time for m-ary method.

probability for a non-zero digit is mm1. Recalling that en1 >0, this implies that νm(e) = mm1logme+ 1 in the average case. The number of squaring operations is also depicted in the figure. If similar curves for 256 bit and 1024 bit exponents are plotted these will be shaped very much like the curves for 512 bit exponents.

In Table 2.1 the minimal total number of operations, in the worst case, is listed for 256, 512 and 1024 bit exponents. With respect to the worst case computing time a window size of 5 is optimal for 512 bit exponents. Com-pared to the binary method (window size 1) the worst case total number of operations is reduced by 37 percents for 512 bit exponents when a window size of 5 is chosen. Although the optimal window size decreases with decreas-ing exponent bit-lengths and increases for increasdecreas-ing exponent bit-lengths a window size of 5 gives a worst case total number of operations that is fairly close to minimum for exponents with bit-lengths from 256 bits to 1024 bits.

In the m-ary method the reduction in computing time time is achieved at the expense of additional registers for holding a table of precomputed values. The main contribution to the reduction is the reduced number of multiplications performed in the last phase of the computation, i.e. in the phase after the precomputation. In this phase the number of multiplications is reduced by approximately a factor ofk = log2 mfromν(e)−1 toνm(e)−1.

But also the number of squarings performed in this phase is (sometimes) slightly reduced from log2 e to (log2 m)logm e . Since log2 e equals (log2 m)logm e +log2 en1 the reduction in the number of squarings is log2 en1 ∈ {0,1, . . . , k 1}. This means that the number of bits, log2 en1+ 1, used in the binary encoding of the most significant digit en1 determines the reduction of squarings. Since the value ben−1 has been precomputed and stored into a table the savings in squarings corresponds to a reuse of squarings already performed during the precomputation. It is possible to get full advantage of these squarings already performed and always achieve a total reduction of k−1 squarings, independent of the number of

bits used for en1. The trick is to allocate windows to the binary encoding of exponent e from left to right instead of from right to left. The following illustration shows how windows are allocated from right to left in accordance with Equation 2.6.

The bits denoted byxsymbolise an arbitrary bit value. Since all windows have the same window size k the most significant digit en1 is padded with j zero-bits, so that the most significant window has k =j+log2 en1+ 1 bits in total. Now, if the windows are allocated from left to right instead and the least significant digit e0 is padded with j zero-bits the following picture is obtained en1. The number of multiplications is unchanged. If Equation (2.10) is used for the computation of exponentials a further reduction of the computing times in Table 2.1 is obtained. The reduction is 3 squarings for 512 bit exponents and 2 squarings for 1024 bit exponents. There is no change in the computing times for 256 bit exponents. This is shown in Table 2.2.

The direction in which the windows are allocated should not be confused with direction in which the exponent digits are scanned. In the above de-scription of the m-ary method the exponent digits are scanned from left to right. Thus the description is still a generalisation of the left-to-right binary method.

2.1. FEWER MULTIPLICATIONS 29 log2 e+ 1 Window size Squarings Multiplications Total Registers

256 4 259 70 329 16

512 5 522 117 639 32

1024 6 1049 201 1250 64

Table 2.2: Minimal, worst case, computing time for m-ary method when windows are allocated left-to-right.

Knuth writes [Knu81, p. 444] that there is also a less obvious right-to-left m-ary method that takes more registers but only a few more operations compared to the left-to-right method. In fact, it is possible to derive a right-to-left method that requires exactly the same number of registers and operations as the left-to-right method. The strategy follows an idea by Yao [Yao76]. Using an m-ary encoding of e, the eth power of b is written as,

be = be0m0+e1m1+···+eim+···+en−1mn−1 (2.11)

= (bm0)e0 (bm1)e1 ∗ · · · ∗(bmi)ei ∗ · · · ∗(bmn−1)en−1.

Ifmis a power of two, saym= 2k, the sequencebm0, bm1, . . . , bmi, . . . , bmn−1, can be computed in (n1)ksquarings. Instead of using registers for a precom-puted table, m−1 registers are used for intermediate results. The registers are initialised to 1 and are denoted c1, c2, . . . , cm1. After the computation of the ith sequence element, bmi, exponent digit ei is inspected. Digit ei can take one of the values in {0,1, . . . , m1}. If ei = j and j > 0 the ith se-quence element is multiplied onto cj. In this way all sequence elements with common exponent digit value j are multiplied onto cj. It can be expressed as

cj = 1

i:ei=j

bmi , for all j = 1,2, . . . , m1 (2.12) Since the first multiplication onto eachcj is a multiplication onto 1 and there-fore negligible, the number of multiplications required to compute c1, c2, . . . , cm1 is νm(e)−δ. Here δ denotes the number of different digit values from {1,2, . . . , m 1} represented by the exponent digits, i.e. the number of non-empty index sets {i : ei =j} in (2.12).

The final phase of the computation uses an algorithm by Brickell et al.

[BGMW92]. In terms of the intermediate results, the eth of b can be

ex-pressed as

be = cmm11∗cmm22∗ · · · ∗c11 (2.13)

= (cm1)(cm1 ∗cm2)∗ · · · ∗(cm1 ∗cm2∗cm3· · · ∗c1).

= cm1∗cm2∗ · · · ∗c1.

This phase of the computation is accomplished by two sweeps through the intermediate results. Start by settingcm1 =cm1 and then calculatecm2 = cm1∗cm2, cm3 =cm2∗cm3, . . . , c1 =c2∗c1 inδ−1 multiplications. If the valuecj is stored in registercj no further registers are required. Through the second sweep cm2 = cm1 ∗cm2, cm3 =cm2∗cm3, . . . , c1 = c2 ∗c1 are calculated in m 2 multiplications. Now the value c1 equals be. In total, including a register for computing the sequence valuesbmi,m registers are required. The total number of operations is (n 1)k squarings plus νm(e)+m3 multiplications. The only difference from the left-to-rightm-ary method is how the operations are divided into squarings and multiplications.

The right-to-left m-ary method described above uses a right-to-left win-dow allocation. By using a left-to-right winwin-dow allocation the number of squarings also reduces tolog2 e−(k1) for the right-to-leftm-ary method.