1, imp
igure 11 shows the full architecture of combined unit
6. Analyze the timing information and generate the timing report
he procedure is illustrated in Figure 14.
Table 5 Critical path of reciprocal unit.
QLATCH 4-1
multiplexer CSA32 Selection Critical path (ns)
summarized in Table 5, Table 6 and Table 7.
The layout was created by using SoC encounter. The gate-level netlist was imported into Encounter and doing Place and Route. After creating the layout, the delays are calculated for each wire (routes). The timing information is saved in .sdf file, and can be imported to the Synopsys
gain to evaluate the impact of the interconnections on the cycle time. The a
la The i
nopsys. Save the design in Verilog forma
T
0.46 0.19 0.17 0.74 1.56
Table 6 Critical path of combined unit ontrol 2-1
multiplexer 4-1
multiplexer CSA42 Short
adder Selection QLATCH Critical path (ns)
C
0.40 0.14 0.09 0.36 0.47 0.40 0.11 1.97
Table 7 Area and power dissipation for reciprocal and combined unit
Unit Area ( µ m
2) Power(mw)
Reciprocal unit 183197.703125 29.3092
Combined unit 328405.000000 41.8629
Table 8 Comparison of delay before and after layout
Unit Before layout (ns) After layout (ns) Difference (%)
reciprocal 1.56 1.68 7.7
Combined unit 1.97 2.42 22.8
RTL- level design
Gate - level netlist
Layout SDF format
timing
Synthesis
Synopsys
SoC encounter
Timing
Figure 14 Procedure of Implementation
able 5 and Table 6 show that for both reciprocal and square root reciprocal operation, the critical ath corresponds to the digit-by-digit part. This means, the proposed new schemes have the same ycle time as the corresponding digit-by-digit schemes. But the number of iterations is only half of
e digit-by-digit schemes alone. The total delay is reduced roughly by half.
ince the approximation part has roughly the same complexity as the conventional digit-by-digit art, the area of the proposed schemes increase the hardware overhead by nearly 2 with respect to
e corresponding digit-by-digit schemes.
or reciprocal, the area-time ratios of higher radix schemes with respect to the radix-4 scheme are escribed in [2]. Therefore, we can compare the design in proposed scheme with more instances in area-time space. Figure 15 shows the area-time ratios for digit-by-digit radix-4, radix-16 using erlapped radix-4 and very high radix-512 schemes in comparison with the proposed design.
the square root reciprocal, there are several implementations using radix-4 and very high radix be me and implementation proposed by [3] has good area-delay figures, so take [3] as the T p
c th S p th
F d an ov For
described in [3] an [5]. A radix-16 implementation with overlapped radix-4 iterations would
significantly more complex, because of the many conditional forms required, so we do not consider
it. The sche
radix-4 design reference. For the very high radix implementation, assume a cost in area 1.5 times cycle time. Figure area and delay of d (a conservative figure) the cost of a very high radix reciprocal unit with the same
5 also shows the ratios corresponding to the square-root reciprocal, using the 1
the conventional radix-4 reciprocal unit as a reference. From the estimations, it can be conclude that the proposed designs introduce attractive points in the area-time space.
Figure 15 Area-time space.
Chapter5
Conclusions
the entire process of design a combined unit for reciprocal and square root sented. A new algorithm is used, which combined digit-by-digit algorithm and pproximation. The approximation part is performed by a digit recurrence using e digit-by-digit part. In this way, two parts can execute in an overlapped equently, only half of the iterations are needed as compared to the conventional
gorithm.
ign of reciprocal unit and combined unit, the radix-4 implementation is used and the e represented in carry-save form. This choice can achieve low latency with moderate verhead. To make the architecture suitable for both reciprocal and square root
utation, a signal quotient digit selection function is used which is developed in [3].
n was synthesized using the 0.18 µ m standard cell library. From synthesis, the critical and power dissipation are estimated. The synthesis results show that the cycle time is hat of the conventional digit-by-digit unit. Since the number of iterations is nearly half -digit method alone, the total execution time is almost reduced by half. On the other addition of the approximation part almost doubles the required area.
design produces four bits per cycle, for reciprocal, it is about 50% faster than a In this project,
reciprocal are pre a Newton-Raphson
duced by th the digit pro
manner. Cons digit-recurrence al For the des residual ar hardware o reciprocal comp The desig path, area as same as t of the digit-by hand, the
Since the proposed
radix-16 implementation with overlapped radix-4 stages with an increase of 20% in area.The
evaluation shows that the propose design shows the good figure in area-time space.
References
arelli, ”Low Latency Digit-Recurrence Reciprocal and , in Proceedings - 17th IEEE Symposium
.
n Kaufmann Publishers, 2003.
t and Its Combination with Division and Sept. 2003, pp. 1100-1114.
ization and Design – the hardware/software 1998.
in a Very-high Radix Combined y Rounding”, IEEE Trans. On Computers,
cal Square Root”, in Proc. 15
thIEEE .
echnical Disclosure Bulletin, vol. 23, no. 8, Jan.
] E. Antelo, P. Montuschi, T. Lang, A. Nann [1
Square-Root Reciprocal Algorithm and Architecture”
on Computer Arithmetic, ARITH-17 2005. pp. 147-154 [2] M. Ercegovac and T. Lang, “Digital Arithmetic”, Morga [3] T. Lang and E. Antelo, “Radix-4 Reciprocal Square-roo Square Root”, IEEE Trans. On Comput., vol. 52, no 9, [4] D. A. Patterson and J. L. Hennessy, “Computer Organ interface”, Morgan Kaufmann Publishers Inc., 2
ndedition, [5] E. Antelo, T. Lang and J.D. Bruguera, “Computation of Division/Square-Root Unit with Scaling and Selection b vol. 47, no. 2, Feb. 1998, pp. 152-161.
[6] N. Takagi, “A Hardware Algorithm for Computing Recipro
Symposium on Computer Arithmetic, pp. 94-100, 2001
[7] A. Weinberger, “4 – 2 carry-save adder module”, IBM T
1981, pp. 3811 – 3814.
Appendices
A
ciprocal unit design
re listed as following, these files are not A.1 VHDL files included in the re
reciprocal.vhd, this is the top level design.
control.vhd convert.vhd convert_app.vhd cpa.vhd
cr_qcalc.vhd csa32LSBs.vhd csa42.vhd
digit_compute.vhd digit_update2.vhd
tend.vhd ex
gl_csa32.vhd gl_dualreg_ld.vhd gl_mux21.vhd gl_reg.vhd mult2.vhd
regs.vhd q_
qds_adder.vhd qds_table.vhd qdsel.vhd qlatch.vhd quotient.vhd rounding.vhd sdet.vhd shifter.vhd
tb2_reciprocal.vhd
Files used by both reciprocal and combined unit a shown in A.2, but shown in B.2.
cpa.vhd
csa42.vhd
gl_csa32.vhd
gl_dualreg_ld.vhd
gl_mux21.vhd
gl_reg.vhd
mult2.vhd
qlatch.vhd
In document
Design of a Combined Unit for Reciprocal and Square Root Reciprocal
(Sider 30-36)