

International Journal of VLSI System Design and Communication Systems ISSN 2322-0929 Vol.04, Issue.13, December-2016, Pages:1529-1533

### Design of Pipelined Butterflies from Radix-2 FFT with Decimation in Time Algorithm using Efficient Adder Compressors

LAMESSA DINGETA<sup>1</sup>, GELAYE GERESU<sup>2</sup>

<sup>1</sup>Dept of ECE, Assosa University, Ethiopia, Email: lms.dngt@gmail.com. <sup>2</sup>HOD, Dept of ECE, Assosa University, Ethiopia, Email: gelaye23@gmail.com.

**Abstract:** This paper addresses the design of power efficient dedicated structures of Radix-2 Decimation in Time (DIT) pipelined butterflies, aiming the implementation of low power Fast Fourier Transform (FFT), using adder compressors, with a new XOR gate topology. In the FFT computation, the butterflies play a central role, since they allow calculation of complex terms. In this calculation, involving multiplications of input data with appropriate coefficients, the optimization of the butterfly can contribute for the reduction of power consumption of FFT architectures. In this paper, different and dedicated structures for the 16 bit width pipelined Radix-2 DIT butterfly, running at 100 MHz, are implemented, where the main goal is to minimize both the number of real multipliers and the critical path of the structures. This is done by changing the structure of the complex multipliers and applying them into the butterflies. For logic synthesis of the implemented butterflies it was used Cadence Encounter RTL Compiler tool with XFAB MOSLP 0.18 Im library. Area and power consumption results are presented for the synthesized butterflies. The main results show that when combining the use of pipeline approach and the use of efficient adder compressors, with a new XOR state topology, the power consumption of the butterflies is significantly reduced.

#### Keywords: DIT, FFT, RTL, XOR, MOSLP, XFAB.

#### I. INTRODUCTION

Fast Fourier Transform (FFT) is the largely implementation of the Discrete Fourier Transform (DFT) used in some communication systems PHY layer and DSP. This algorithm performs the calculation of complex terms, which involves the multiplication of input data by appropriate coefficients. The FFT algorithm, started a new era in digital signal processing by reducing the orders of complexity of DFT multiplications compared to a normal DFT. Since multipliers and adders are very power hungry elements in VLSI designs they result in consequent power consumption. The basic principle of FFT is that to find mechanism of time saving. The Fast Fourier Transform analysis is to convert the original signal to frequency domain signal and vice versa. We need to perform N multiplications and N-1 additions for an N-point DFT. Therefore, there will be complex multiplications of N2 and complex additions of N (N-1). Different platforms such as computer chips and general purpose processors are implemented by the Fast Fourier Transform (FFT).FFT uses a divide and conquer methodology for its computation process. This divides the N co-efficient into smaller blocks in different stages. Adders are mostly used in electronic applications. Digital adder is an important specification in advanced digital processors for faster computation. The speed of addition is limited by the time needed for a carry to propagate through the adder, in digital adder circuits. Adders in circuits acquire extremely large area and consume large power as large additions are finished in advanced systems. Adder is one of the important blocks in ALU and DSP systems. An adder plays a significant role include convolution, digital filtering like DFT, FFT, digital communications and spectral analysis. In the traditional FFT which uses the adder structure is not suitable due to its low processing speed and large power consumption. Here, the traditional adder is replaced with proposed carry select adder. In this paper, pipelined radix-2 SDF-SDC FFT using modified carry select adder is used to high processing speed and high performances of FFT processor.

#### **II. LITERATURE SURVEY**

A. Explained the Design of Pipelined Structure from Radix-2 FFT with DIT algorithm using efficient adder compressors. The different and dedicated structures for the 16 bit-width pipelined radix-2 DIT butterfly running at 100MHZ are implemented. The main goal of this paper is to minimize the number of real multipliers of the architectures. This is done by varying the structure of the complex multipliers and applying them into the butterflies. The adder compressors structures are widely used in fast and low power multiplier architectures. Parallel FFT processor has more disadvantages in hardware utilization and speed of processing. In order to overcome the problem of parallel FFT, Pipelined FFT processor such as R2SDF and R2MDC FFT structures have been introduced. In, Pipelined Radix-2k FFT structures have been developed with the help of Feed

forward structures. Feed forward structure provides 26ns for performing 8-point FFT.

**B. Explained the FFT Processor Using Radix**-24 feed forward pipeline architecture, which has low latency, high throughput and lesser area. In FFT computation, there are number of complex multiplication and addition operations. In VLSI implementation of FFT, multipliers takes large time in calculation, therefore it increases the delay of FFT processor. Described the efficient combined single-path delay commutator-feedback (SDC-SDF) radix-2 pipelined FFT architecture, which includes N number of SDC stages and 1 SDF stage the combined SDC-SDF pipelined FFT architecture which provides the output data in the normal order the SDC processing engine is to achieve 100% utilization. The proposed SDC PE reduces 50% complex multipliers, compared with other radix-2 FFT.

C. Explained the Cs Operation is Scheduled Before the Computation of Final-sum, which is different from the traditional approach. Carry words identical to input-carry '0' and '1' generated by the CSLA based on specific bit pattern, which is used for logic optimization of the CS unit. For logic optimization, fixed input bits of the CG unit are also used. An optimized design for CS and CG units are obtained. An efficient design is obtained for the CSLA, by using these optimized logic units. CSLA design involves less area and delay than the BEC based CSLA. described the new design of LPPL FFT processor and its two basic building blocks, butterfly in pipeline and address generator. Explained an efficient adder design essentially improves the performance of a complex DSP system presented the low-cost VLSI implementation of a pipeline fast Fourier transform capable of supporting from 1k to 32k FFT sizes.

#### III. EXISTING AND PROPOSED METHODS

#### A. Existing Method

## 1. Single Path Delay Feedback– Single Path Delay Commutator (SDF-SDC) FFT

The SDF FFT is a serial processor which provides high speed operation. In R2SDF, the inputs are given into serial manner. In R2SDF FFT, N/2 point input data is sequentially controlled with the help of Flip-Flop circuit. This FFT structure consumes more number of hardware utilization and power consumption due to utilizing or storing bulk of unwanted intermediate processing signals. Hence, large power consumption is one of the main disadvantages of R2SDF FFT. Single Path delay Commutator FFT has more number of single delay commentators within one stage. But in case of SDF FFT, single number of large delay feedbacks is used to implement the functions of FFT. Both SDF and SDC architectures are used in the proposed design. In the place of multiplier unit, Bit Parallel Multiplier is used for multiply the subtracted data into corresponding twiddle factor values. Complex input data is considered to perform the FFT function. In every step, there is single delay commutating function has been used to process the appropriate data points. The Multiplexer units have been used to provide control signals for performing Commutator functions. Further signed

addition and signed subtraction units are used to perform accumulation and subtraction functions. When compared to SDF structure, SDC architecture has more computational paths to perform FFT function. Hence, to improve the architectural performances of FFT combined Single-path Delay Feedback (SDF) – Single-path Delay Commutator (SDC) FFT architecture has been designed in this paper. This achievement can be obtained due to sharing or utilizing 50% of same hardware resources for computing multiple functions. The architecture of 16 point SDF-SDC FFT is illustrated in [Fig. 1].



Fig.1. Architecture of 16 Point SDF-SDC FFT

#### 2. Carry Select Adder

A carry select adder is a combinational logic of arithmetic circuit, which adds the binary value of 2 N-bit numbers and outputs their N-bit binary sum and a 1-bit carry. Carry select adder comes in the group of conditional sum adder. Sum and carry are computed by assuming input carry as 1 and 0 prior the input carry comes.



Cout, High Bits

Fig.2. Architecture of Carry Select Adder

When the input value of carry arrives, the actual calculated values of sum and carry are selected using a multiplexer. In the FFT processor, the computational procedure includes vast number of multiplications and additions. To implement the addition operation in the FFT processor, carry select adder has been used. In the existing method of SDF-SDC FFT using carry select adder, which is used to reduce the power consumption of the FFT processor. The architecture of Carry select adder is illustrated in [Fig.2].

# Design of Pipelined Butterflies from Radix-2 FFT with Decimation in Time Algorithm Using Efficient Adder CompressorsB. Proposed Method3. Pipelined Radix-2 SDF-SDC FFT using Modified

#### 1. Modified Carry Select Adder

In the carry select adder circuit, full adder circuit is reduced to improve the performances of the structure. Full adder is the important block of CSLA circuit. The carry select adder circuit consists of 4-bit adder (4 Full adders) and multiplexer. The reduced full adder design has been illustrated in this section.

#### 2. Reduced Full Adder

Full Adder circuit has been realized and redundant functions are eliminated to improve the architectural performances. The generalized Full Adder circuit block is illustrated in [Fig. 3]. FA circuit consists of two XOR gate, two AND gate and a single OR gate to perform the 3-bit addition operation. Reduced Full Adder (RFA) circuit has been designed by using minimal number of logic gates. Also Multiplexer (MUX) based RFA circuit has been designed in this paper to further alleviates the performances of digital adder circuits.

Gate Count of Full Adder is determined as follows,

Gate Count of FA = Gate Count [(2\*XOR) + (2\*AND) + (1\*OR)]

Gate Count of FA = [(2\*5) + (2\*1) + (1\*1)] = 10+2+1 = 13

Gate Count of Reduced Full Adder = Gate Count [(2\*AND) + (1\*OR) + (2\* NOT) + (1\*MUX)]

Gate Count of Reduced Full Adder = [(2\*1) + (1\*1) + (2\*1) + (1\*4)] = 2+1+2+4 = 9

The structure of Reduced Full Adder (RFA) is illustrated in [Fig.3].



Fig.3. Full Adder Circuit



Fig.4. Reduced Full Adder Circuit

#### Carry Select Adder

In this paper, the pipelined Radix-2 SDF-SDC FFT using modified Carry Select Adder has been proposed. In the existing method of FFT, the power consumption is large and also the processing speed is low. The proposed architecture of pipelined 16 point SDF-SDC FFT using Modified CSLA has been illustrated in [Fig. 5]. The pipelined SDF-SDC FFT using carry select adder has been designed, that is used to reduce the power consumption but increasing the hardware slices .So, we modified the carry select adder circuit, which is to reduce the hardware slices, LUTs, delay and also power consumption. In the modified carry select adder, the full adder circuit is reduced. In the full adder circuit, the number of logic gates is reduced. Finally, the modified carry select adder circuit is integrated into Radix-2 SDF-SDC FFT processor. The main goal of this paper is to improve the processing speed and performances of the FFT processor.



Fig.5. Proposed Architecture of Pipelined 16 Point SDF-SDC FFT Using Modified CSLA

#### **IV. SYNTHESIS RESULTS**

All the developed architectures were described in Verilog and synthesized to XFAB MOSLP  $0.18\mu$ m 1.8V library in typical operation conditions, using the Cadence Encounter RTL Compiler tool. The Leakage power for each structure was obtained in the order of nano Watts (nW) scale, and thus it was suppressed in the results. Power consumption, after switching activity analysis using 10,000 inputs vectors, and area ( $\mu m^2$ ) results, are presented in Table I and Table II respectively. In these tables, the results are presented by using one and two stages of pipeline, where the columns 'with CSA' represent the results of the butterflies using the Carry Save Adder (CSA) applied in Fig. 6 by the Cadence tool, while the columns represented by' with Compressors' is related to the butterflies results using the adder compressors according to Fig. 6 structures.

International Journal of VLSI System Design and Communication Systems Volume.04, IssueNo.13, December-2016, Pages: 1529-1533

 TABLE I. Power (mW) Comparison Between the Radix-2

 Butterflies in 180nm @ 100MHz

|           | with CSA |           | with Compressors |           |
|-----------|----------|-----------|------------------|-----------|
| Structure | One Pipe | Two Pipes | One Pipe         | Two Pipes |
| А         | 6.22     | 8.12      | 5.80             | 6.51      |
| В         | 11.50    | 8.83      | 10.53            | 6.96      |
| С         | 7.81     | 8.85      | 8.53             | 7.58      |
| D         | 10.31    | 8.68      | 11.62            | 8.21      |

The presented results in Table I show that the structure A presents the less power consumption among the structures. However, we have observed that the use of two levels of pipeline has lead to the significant power consumption reduction from the structures from B to D. This occurs because with the use of two levels of pipelining there is a combination of a large reduction in the critical path and the



Fig.6. The (a) Common Butterfly Structure and (b) (c) (d) Using The Complex Multiplication Schemes Of [3]



Fig.7. Basic Structure of Adder's Compressors



Fig.8. The Butterflies Structures with Adder Compressors

use of one less multiplier in these structures. In this way, with two levels of pipeline, the structure B could be an interesting choice, since it presents less area (Table II) with almost the same power consumption of the structure A.

TABLE II. Area (μm²) Comparison Between the Radix-2Butterflies in 180nm @ 100MHz

|           | with CSA |           | with Compressors |           |
|-----------|----------|-----------|------------------|-----------|
| Structure | One Pipe | Two Pipes | One Pipe         | Two Pipes |
| А         | 44314    | 49529     | 48197            | 53144     |
| В         | 42036    | 46519     | 46623            | 49320     |
| С         | 43103    | 46823     | 46322            | 50872     |
| D         | 45157    | 48050     | 48956            | 51622     |

Although the use of adder compressors has increased the area of the butterflies, when compared with the structures using adders from the tool, we can observe that the combination of the use of pipeline and adder compressors has reduced the power consumption of some of the butterfly structures, when one level of pipeline has been considered. However, when two levels of pipeline are taken into account, the power consumption of all of the butterflies with adder compressors is significantly reduced, as can be seen in TABLE I. Another important aspect to be observed in TABLE I is that the use of two levels of pipeline has increased the power consumption of the structure A. In fact, although this structure has used one more multiplier than the other ones, as it presents a reduced critical path, the use of more than one level of pipeline does not contribute for the reduction of the glitching activity. Thus, as area is increased by the use of more registers, power is also increased with two levels of pipeline. On the other hand, the other structures use one less multiplier circuit at the cost of the increase of the critical path. Thus, the use of more than one stage of pipeline contributes for a significant reduction in the glitching activity and consequently the power consumption is reduced in the structures B to D. In Eq. 1 we have used a power relation Ebetween different technologies, according to [9], where  $P_1$ denotes the power consumption in technology 1 and  $P_2$ denotes the power consumption in technology 2. This equation enables the comparison of our results to the results of literature.

$$E = \frac{Tecnology \ 1}{Tecnolog \ y \ 2}, |E| > 1$$
(1)

$$P_1 = \frac{1}{E^2} \cdot P_2 \tag{2}$$

We have compared the power consumption results from our best results using 4 multipliers (structure A) and 3 multipliers (structure B) with the work of [6], as can be seen in TABLE III. Although our structures have presented more power consumption, it should be observed that the results of [6] are presented in a more reduced technology than our work, which enables the use of a more reduced power supply. As the dynamic power consumption is proportional to the square of the power supply, thus the reduction of the power supply contributes for the reduction of power consumption of [6]. However, our results are very promising, since the difference between the powers results in different technologies are small. Thus, we are convinced that when we have our results at the same technology used in [6], the power consumption of our structures can be more reduced, as

International Journal of VLSI System Design and Communication Systems Volume.04, IssueNo.13, December-2016, Pages: 1529-1533

#### Design of Pipelined Butterflies from Radix-2 FFT with Decimation in Time Algorithm Using Efficient Adder Compressors

can be seen in the Projection results in TABLE III (given by using (1)).

|              | Technology   | 4 Multipliers<br>structure A | 3 Multipliers<br>structure B |  |  |  |
|--------------|--------------|------------------------------|------------------------------|--|--|--|
| This Work    | 0.18µm       | 5,80                         | 6,96                         |  |  |  |
| Projection † | 0,11µm       | 2,17                         | 2,60                         |  |  |  |
| [5]          | $0.11 \mu m$ | 4.34                         | 4.22                         |  |  |  |

# TABLE III. Power (mW) Comparison Between Radix-2 Butterflies @ 100MHz

Power projection results of this work to the  $0.11\mu$ m technology

#### **V. CONCLUSION**

Pipelined Radix-2 Single path Delay Feedback (SDF) -Single path Delay Commutator (SDC) FFT using Modified Carry Select Adder has been proposed through Very Large Scale Integration (VLSI) System design environment. Reduced full adder is designed using less number of gates compared to conventional Full adder. These reduced adders are applied in the carry select adder to analyze and improve the performance. The modified carry select adder is incorporated into Radix-2 SDF-SDC FFT processor. The main goal of this paper is to reduce the processing time and improve the speed of the FFT processor. The proposed method is used to reduce the slices, LUTs, delay and power consumption. The proposed pipelined Radix-2 SDF-SDC FFT offers 3.96% reduction in hardware slices, 2.04% reduction in number of LUTs, and 5.26% reduction in delay and 63.79% reduction in power consumption than the existing Radix-2 SDF-SDC FFT. In future, the proposed architecture will be absolutely useful in OFDM based digital communication to perform the function of frequency transformation and to analyze the spectrum characteristics of digital inputs.

#### **VI. REFERENCES**

[1]Mateus Beck Fonseca, Jo<sup>°</sup>ao Baptista S. Martins, Eduardo A. C<sup>°</sup>esar da Costa, "Design of Pipelined Butterflies from Radix-2 FFT with Decimation in Time Algorithm Using Efficient Adder Compressors", IEEE 2011.

[2]J. Cooley and J. Tukey, "An algorithm for the machine calculation of the complex Fourier series," Math. Comput, vol. 19, pp. 297 – 301, 1965.

[3]A. Wenzler and E. Luder, "New structures for complex multipliers and their noise analysis," in Circuits and Systems 1995, IEEE International Symposium on. ISCAS '95., vol. 2, apr-3 may 1995, pp. 1432 –1435 vol.2.

[4]S. Bouguezel, M. Ahmad, and M. Swamy, "An approach for computing the radix-2/4 DIT FHT and FFT algorithms using a unified structure," in Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on, vol. 1, 23-26 2005, pp. 836–839.

[5]C. Gonzalez-Concejero, V. Rodellar, A. Alvarez-Marquina, E. Icaya, and P. Gomez-Vilda, "An FFT/IFFT design versus Altera and xilinx cores," in Reconfigurable Computing and FPGAs, 2008. ReConFig '08. International Conference on, 3-5 2008, pp. 337–342.

[6]J. Takala and K. Punkka, "Scalable FFT processors and pipelined butterfly units," J. VLSI Signal Process. Syst., vol. 43, no. 2-3, pp. 113–123, 2006.

[7]A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing, 2nd ed., ser. Signal Processing Series. Prentice Hall, January 1999. [Online]. Available: http://www.worldcat.org/isbn/0137549202

[8]O. M. A. J. S. Rouholamini, M. Kavehie and K. Navi, "A new design for 7:2 compressors," Intl. Conf. on Computer Systems and Applications, 2007.

[9]J. Rabaey, Low Power Design Essentials. Springer Publishing Company, Incorporated, 2009.

[10]T. D. Chhatbar, "High speed high throughput FFT/IFFT processor ASIC for mobile Wi-Max," ICETET, pp. 402–405, 2009.

#### Author's Profile:



Lamessa Dingeta, from Assosa University, Department of ECE, Ethiopia. E-mail: lms.dngt@gmail.com



**Gelaye Geresu,** Head of The Department, from Assosa University, Department of ECE, Ethiopia. E-mail: gelaye23@gmail.com.