Quotient Determination Unit - Processor Description

4.2 Processor Description

4.2.4 Quotient Determination Unit

The method implemented by the quotient determination unit is identical to the method described in the article in Appendix B. The residue range is non-negative, and the operation of the unit can be speciﬁed by

{Determine integerq such thatR−qm ∈[0;αm[ and q∈ {0,1, . . . ,42}}. Since the unit merely computes an estimate of the exact quotient digit, the parameter α has a value greater than 1. The scaled modulus m is equal to 2⁵m, where m ∈ [2⁵⁶⁰; 2⁵⁶¹[. The restriction on the maximal allowable quotient digit value, q^max = 42, is imposed by the method for computation of multiples. However, as will be shown below, the quotient digit values determined by the method of this quotient determination unit will never exceed 39.

4.2. PROCESSOR DESCRIPTION 147 The basic principle of the method is to assign the valuej to the quotient digit q if R ∈[jm; (j+ 1)m[ and, hereby, obtaining that R−qm ∈[0;m[.

By inspecting the results of the comparison operations R −jm ≥ 0 and R−(j+ 1)m ≥0 it is checked if R indeed belongs to [jm; (j+ 1)m[. The comparison constants cj = −jm, where j ∈ {0,1, . . . ,39}, are computed simultaneously with the initialisation of the processor. Since the modulus m is put serially into the processor, a serial computation of the constants can be implemented by means of a few full adders and ﬂip-ﬂops.

However, in order to reduce the computing time for the comparison op-erations, the comparisons are limited to the most signiﬁant parts of the operands: The quotient determination unit performs the comparisons

Rˆ+ ˆcj ≥0, for all j ∈ {0,1,2, . . . ,39}, (4.2) where ˆR and ˆcj refer to the most signiﬁcant parts of R and cj. Then, if Rˆ+ˆcj ≥0 and ˆR+ˆcj+1 <0 the valuej is assigned to the quotient digit. Using the notation in Section 3.10, where ∆ refers to the truncation error introduced by neglecting the u least signiﬁcant digits, the operands are written as R = 2^uRˆ+ ∆R and cj = 2^uˆcj + ∆c_j. Hence, ˆR+ ˆcj ≥ 0 implies that R−jm≥

∆R+ ∆c_j, and ˆR+ ˆcj+1 <0 implies that R−(j + 1)m <∆R+ ∆c_j+1. So, R−jm ∈[∆R+ ∆c_j;m+ ∆R+ ∆c_j+1[. According to Table 3.1, page 107, the truncation errors of the binary represented comparison constants belong to [0; 2^u−1], and the truncation error of the carry save represented R belongs to [0; 2(2^u−1)]. Therefore, the resulting quotient digit value will ensure that R−qm ∈[0, m+ 3(2^u−1)[= [0;αm[. (4.3) According to the analysis in the article in Appendix B, it is suﬃcient to use the precision u= 561 to restrict the values of q to the set{0,1, . . . ,42}.

Indeed, it turns out thatq will never be greater than 39: Equation (4.3) gives that αm = m + 3(2^u −1). By inserting u = 561 and using m = 2⁵m ≥ 2⁵2⁵⁶⁰, it follows that α < ¹⁹₁₆. Furthermore, R is the result of a recursive computation of the formRi := 2⁵(Ri+1−qi+1m) +aib, whereai ∈[0; 31] and b ∈[0;αm[= [0;₃₂^αm[. Hence, R < 32(αm) + ³¹₃₂αm. Finally, using α < ¹⁹₁₆, it is seen that R < 40m and, consequently, the quotient digit value will be less than 40.¹

1In fact it is suﬃcient to use the quotient digit set{0,1, . . . ,38}: Theorem 3.8–1 states that a digit set bounded byq^max= [(δ−1)α] is suﬃcient to achieveR−qm ≤αm, when R ≤ δαm. According to the above discussion, δ is bounded by δ < 32 + ³¹₃₂, andα is bounded byα < ¹⁹₁₆. Therefore,(δ−1)αis bounded by(δ−1)α ≤38.

In the quotient determination unit the comparison operations include the 12 most signiﬁcant digits of R and cj = −jm from position 561 up to position 572. Since an upper bound ofRandjm, given by 40m <40 2⁵2⁵⁶¹, is less than 2⁵⁷², the digits at position 572 determine the sign of the operands.

(The two’s complement representation is used for representation of negative operands).

The hardware architecture of the quotient determination unit is shown in Figure 4.8. It consists of 39 instances of a comparison unit and 40 exclu-sive or (XOR) gates. The input to the quotient determination unit, and to the instances of the comparison unit, comprises the 12 most signiﬁcant dig-its, denoted ˆR^X_i+1, of the carry save represented intermediate operand R^X_i+1. The output of the quotient determination unit is a bit-vector representation (v0, v1, . . . , v39) of the value of quotient digit q^X_i+1: The bit-vector represen-tation obeys thatvj = 1 if, and only if, q_i+1^X =j.

Figure 4.8: Hardware architecture of the quotient determination unit.

Each of the 39 comparison units performs one of the comparisons in (4.2).

The unit denotedcomparison unit,j performs the comparison ˆR^X_i+1+ ˆcj ≥0.

4.2. PROCESSOR DESCRIPTION 149 The result of the comparison is denoted carryj. Signal carryj is 1 if the answer is “true”, and 0 if the answer is “false”. So, according to the above discussion, the signal vj (equal to the exclusive or of carryj and carryj+1) takes the value 1 if, and only if, the valuej should be assigned to the quotient digit. Since the intermediate operandR^X_i+1 is restricted to the range [0; 40m[, signal carry0 is constantly 1, and carry40 is constantly 0.

The hardware architecture of thecomparison unit is shown in Figure 4.9.

The unit contains a 12 bit register denoted ˆcj register. The register holds the most signiﬁcant part of the comparison constant cj = −jm. A carry save represented sum of ˆR^X_i+1 and ˆcj is computed by means of a carry save adder. Finally, the sign of the sum is computed by a binary addition. The value of the carry signal, denoted carryj, encodes the sign: Since ˆR_i+1^X is non-negative and ˆcj is negative, a carry will be generated if ˆR^X_i+1 + ˆcj ≥ 0.

No carry will be generated if ˆR^X_i+1+ ˆcj <0. Thebinary adder is an instance of the so-called high speed adder generated by the chip development tools.

It should be mentioned, that only the resulting carry signal from the adder is needed. The binary represented sum is discarded. Therefore, a more eﬃcient implementation of the comparison unit could be achieved by replacing the binary adder with a simpler circuit, where the functionality is limited to a computation of the resulting carry.

Figure 4.9: Hardware architecture of the comparison unit.

In document View of Exponentiation, Modular Multiplication and VLSI Implementation of High-Speed RSA Cryptography (Sider 160-164)