

Contents lists available at www.ijicse.in

International Journal of Innovative Computer Science & Engineering

Volume 3 Issue 4; July-August-2016; Page No. 18-20

## Design and Implementation of Area Efficient Distributed Arithmetic using Divided LUT Architecture

## <sup>1</sup> Perika Kalanwesh,<sup>2</sup>, E.Parusha Ramu

<sup>1</sup>Student at Sri Indu College of Engineering and Technology, Hyderabad, India.

<sup>2</sup>Assistant Professor at Sri Indu College of Engineering and Technology , Hyderabad, India.

ABSTRACT

## **ARTICLE INFO**

Received: 17 June 2016 Accepted 30 July 2016

#### **Corresponding Author:**

## Perika Kalanwesh

Student at Sri Indu College of Engineering and Technology, Hyderabad, India. Digital filters are the essential units for digital signal processing systems. Traditionally, digital filters are achieved in Digital Signal Processor (DSP), but DSP-based solution cannot meet the high speed requirements in some applications for its sequential structure. Nowadays, Field Programmable Gate Array (FPGA) technology is widely used in digital signal processing area because FPGA-based solution can achieve high speed due to its parallel structure and configurable logic, which provides great flexibility and high reliability in the course of design and later maintenance. In general, Digital filters are divided into two categories, including Finite Impulse Response (FIR) and Infinite Impulse Response (IIR). And FIR filters are widely applied to a variety of digital signal processing areas for the virtues of providing linear phase and system stability.

Keywords – Distributed Arithmetic; FIR; pipeline; LUT; FPGA

© IJICSE, All Right Reserved.

## Introduction

Digital filters are the essential units for digital signal processing systems. Traditionally, digital filters are achieved in Digital Signal Processor (DSP), but DSPbased solution cannot meet the high speed requirements in some applications for its sequential structure. Nowadays, Field Programmable Gate Array (FPGA) technology is widely used in digital signal processing area because FPGA-based solution can achieve high speed due to its parallel structure and configurable logic, which provides great flexibility and high reliability in the course of design and later maintenance. In general, Digital filters are divided into two categories, including Finite Impulse Response(FIR) and Infinite Impulse Response(IIR). And FIR filters are widely applied to a variety of digital signal processing areas for the virtues of providing linear phase and system stability.

The FPGA-based FIR filters using traditional direct arithmetic costs considerable multiply-and-accumulate (MAC) blocks with the augment of the filter order. However, according to Distributed Arithmetic, we can make a Look-Up-Table(LUT) to conserve the MAC values and callout the values according to the input data if necessary. Therefore, LUT can be created to take the place of MAC units so as to save the hardware resources. This paper provide the principles of Distributed Arithmetic, and introduce it into the FIR filters design, and then presents a 31-order FIR low-pass filter using Distributed Arithmetic, which save considerable MAC blocks to decrease the circuit scale, meanwhile, devided LUT metherd is used to decrease the required memory units and pipeline structure is also used to increase the system speed.

## **II. DISTRIBUTED ARITHMETIC**

Distributed Arithmetic was first brought up by Croisier<sup>[1]</sup>, and was extended to cover the signed data system by Liu , and then was introduced into FPGA design to save MAC blocks with the development of FPGA technology.

The N-length FIR filter can be described as:

$$y =  = \sum_{n=0}^{N-1} h[n]x[n]$$

Where h[n] is the filter coefficient and x[n] is the input sequence to be processed. The FIR structure consists of a series of multiplication and addition units, and consume N MAC blocks of FPGA, which are expensive in high speed system. Compared with traditional direct arithmetic, Distributed Arithmetic can save considerable

 $_{\rm Page} 18$ 

hardware resources through using LUT to take the place of MAC units <sup>[2]</sup>. Another virtue of this method is that it can avoid system speed decrease with the increase of the input data bit width or the filter coefficient bit width, which can occur in traditional direct method and consume considerable hardware resources <sup>[3]</sup>.

Distributed Arithmetic is introduced into the design of FIR filters as follows.

In the two's complement system, x[n] can be described as:

$$x[n] = -2^{B} x_{B}[n] + \sum_{b=0}^{B-1} 2^{b} x_{b}[n]$$
(2)

Substituting equ.(2) into equ.(1) yields:

$$y = -2^{B} x_{B}[n]h[n] + \sum_{b=0}^{B-1} h[n] \sum_{n=0}^{N-1} 2^{b} x_{b}[n]$$
(3)

The second part of the equ. (3) can be changed into another form:

$$\sum_{b=0}^{B-1} h[n] \sum_{n=0}^{N-1} 2^b x_b[n] = \sum_{b=0}^{B-1} 2^b \sum_{n=0}^{N-1} h[n] x_b[n]$$
(4)

Substituting equ. (4) into equ. (3) yields to the final form of Distributed Arithmetic:

$$y = -2^{B} x_{B}[n]h[n] + \sum_{b=0}^{B-1} 2^{b} \sum_{n=0}^{N-1} h[n] x_{b}[n]$$
(5)

According to Distributed Arithmetic, we can make a Look-Up-Table(LUT) to conserve the MAC values and callout the values according to the input data if necessary. Therefore, LUT can be created to take the place of MAC units so as to save the hardware resources.



Fig.1 The basic Distributed Arithmetic structure

#### Look Up Table:

It stores the Filter co-efficient values.

As we are supposed to design 32-order filter, with the increase of filter order, the scale of LUT will increase dramatically [7], which will cost more time to look up the table and more memory to store the values. Therefore, we can divide the LUT unit into four small LUT units to solve this problem.

Coefficient values of small LUT is given below

Tab.2 Coefficient Values of LUT

| Tab.2 Coefficient values of LUT |                     |
|---------------------------------|---------------------|
| $b_3 b_2 b_1 b_0$               | Data                |
| 0000                            | 0                   |
| 0001                            | h[0]                |
| 0010                            | h[1]                |
| 0011                            | h[0]+ h[1]          |
| 0100                            | h[2]                |
| 0101                            | h[0]+ h[2]          |
| 0110                            | h[1]+ h[2]          |
| 0111                            | h[0] + h[1] + h[2]  |
| 1000                            | h[3]                |
| 1001                            | h[0]+ h[3]          |
| 1010                            | h[1]+ h[3]          |
| 1011                            | h[0]+ h[1]+ h[3]    |
| 1100                            | h[2]+ h[3]          |
| 1101                            | h[0]+h[2]+h[3]      |
| 1110                            | h[1]+h[2]+h[3]      |
| 1111                            | h[0]+h[1]+h[2]+h[3] |



Fig: 2: Structure of FIR filter based on Distributed Arithmetic

The values of the four divided LUT units is added as the final value. The input data is defined as 12-bit-width complement, and the system can also process signed signals. According to the structure above, we achieve the whole design using Verilog language in Quartus 6.2, and the core code of the whole realization is as follows: //P\_DATA\_W: processing bit width //shift\_bit : to shift the data

//table\_out\_t: the output of the third leval register
//div\_count: the number counter of system clock

if(div\_count ==P\_DATA\_W-1)

sum<=sum - shift\_bit (table\_out\_t, div\_count-1); else sum<=sum+shift\_bit (table\_out\_t,div\_count-1);</pre>

We also achieved the same filter using direct arithmetic to make a contract with the performance of the designed filter.

### CONCLUSION

This paper presents the design and implementation based on Distributed Arithmetic, which is used to realize a 31-order FIR low-pass filter. Distributed Arithmetic structure is used to increase the resourse useage while

$$_{\rm Page}19$$

pipeline structure is used to increase the system speed. The test results indicate that the designed filter using Distributed Arithmetic can work stable with high speed and can save almost 50 percent hardware resourses. Meanwhile, it is very easy to transplante the filter to other applications through modifying the order parameter or bit width and other parameters, and therefore have great practical applications in digit signal processing.

# REFERENCES

- Uwe Meyer-Baese.Digital signal processing with FPGA[M]. Beijing:Tsinghua University Press, 2006:50~51
- Tsao Y C and Choi K. Area-Efficient Parallel FIR Digital Filter Structures for Symmetric Convolutions Based on Fast FIR Algorithm [J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010, PP(99):1~5.
- Chao Cheng and Keshab K Parhi. Low-Cost Parallel FIR Filter Structures With 2-Stage Parallelism [J].IEEE Transactions on Circuits and Systems I: Regular ,2007,54(2):280~290.
- 4. Tearney G J and Bouma B E. Real-Time FPGA Processing for High-Speed Optical Frequency Domain Imaging [J]. IEEE Transactions on Medical

Imaging, 2009,28(9):1468~1472.

- 5. Hu Guang-shu. Digital signal processingtheory,algorithm and realizes[M]. 2nd ed.Beijing: Tsinghua University Press,2003:296~307.
- Chun Hok Ho,Chi Wai Yu and Leong P. Floating-Point FPGA: Architecture and Modeling [J]. IEEE Transactions on Very Large Scale Integration Systems, 2008,17(12): 1709~1718.
- Evans J B. Efficient FIR filter architectures suitable for FPGA implementation[J].IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 2002,41(7):490~493.
- Meher P K, Chandrasekaran S and Amira A. FPGA Realization of FIR Filters by Efficient and Flexible Systolization Using Distributed Arithmetic [J]. IEEE Transactions on Signal Processing, 2008,56(7): 3009~3017.
- Xia Yu-wen. Digital system design with Verilog[M].
   2nd ed.Beijing:Higher Education Press,2008: 102~103.
- Sungwook Yu and Swartziander E E. DCT implementation with distributed arithmetic [J]. IEEE Transactions on Computers, 2001,50 (9):985~991.

# AUTHOR DETAILS:

# Perika Kalanwesh:

has completed her B.Tech in Electronics and Communication Engineering from Princeton College of Engineering and Technology, J.N.T.U.H affiliated college in 2011.He is pursuing his M.Tech in VLSI System Design from Sri Indu College of Engineering and Technology, J.N.T.U.H affiliated college.

# **E.Parush Ramu:**

Asst.Prof of Electronics and Communication Engineering Department, Sri Indu College of Engineering & Technology, Hyderabad. He received her Bachelor degree in Electronics and Communication Engineering from Osmania university College Of Engineering, Hyderabad, M.Tech, V.L.S.I System Design from J.N.T.U.H affiliated college.