A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by the customer or designer after manufacturing ,hence "field-programmable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as they were for ASICs, but this is increasingly rare). FPGAs can be used to implement any logical function that an ASIC could perform. The ability to update the functionality after shipping, partial re-configuration of a portion of the design and the low non-recurring engineering costs relative to an ASIC design (notwithstanding the generally higher unit cost), offer advantages for many applications.
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by the customer or designer after manufacturing ,hence "field-programmable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as they were for ASICs, but this is increasingly rare). FPGAs can be used to implement any logical function that an ASIC could perform. The ability to update the functionality after shipping, partial re-configuration of a portion of the design and the low non-recurring engineering costs relative to an ASIC design (notwithstanding the generally higher unit cost), offer advantages for many applications.
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by the customer or designer after manufacturing ,hence "field-programmable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as they were for ASICs, but this is increasingly rare). FPGAs can be used to implement any logical function that an ASIC could perform. The ability to update the functionality after shipping, partial re-configuration of a portion of the design and the low non-recurring engineering costs relative to an ASIC design (notwithstanding the generally higher unit cost), offer advantages for many applications.
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured
by the customer or designer after manufacturing ,hence "field-programmable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as they were for ASICs, but this is increasingly rare). FPGAs can be used to implement any logical function that an ASIC could perform. The ability to update the functionality after shipping, partial re-configuration of a portion of the design and the low non- recurring engineering costs relative to an ASIC design (notwithstanding the generally higher unit cost), offer advantages for many applications. FPGAs contain programmable logic components called "logic blocks", and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together" somewhat like many (changeable) logic gates that can be inter-wired in (many) different configurations. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory. In addition to digital functions, some FPGAs have analog features. The most common analog feature is programmable slew rate and drive strength on each output pin, allowing the engineer to set slow rates on lightly loaded pins that would otherwise ring unacceptably, and to set stronger, faster rates on heavily loaded pins on high-speed channels that would otherwise run too slow. [3][4] Another relatively common analog feature is differential comparators on input pins designed to be connected to differential signaling channels. A few "mixed signal FPGAs" have integrated peripheral Analog-to-Digital Converters (ADCs) and Digital-to-Analog Converters (DACs) with analog signal conditioning blocks allowing them to operate as a system-on-a- chip. Such devices blur the line between an FPGA, which carries digital ones and zeros on its internal programmable interconnect fabric, and field-programmable analog array (FPAA), which carries analog values on its internal programmable interconnect fabric.
Virtex family The Virtex series of FPGAs have integrated features that include FIFO and ECC logic, DSP blocks, PCI-Express controllers, Ethernet MAC blocks, and high-speed transceivers. In addition to FPGA logic, the Virtex series includes embedded fixed function hardware for commonly used functions such as multipliers, memories, serial transceivers and microprocessor cores. These capabilities are used in applications such as wired and wireless infrastructure equipment, advanced medical equipment, test and measurement, and defense systems. Some Virtex family members are available in radiation-hardened packages, specifically to operate in space where harmful streams of high-energy particles can play havoc with semiconductors. The Virtex-5QV FPGA was designed to be 100 times more resistant to radiation than previous radiation-resistant models and offers a ten-fold increase in performance. However, characterization and test data were not yet available for the Virtex-5QX on the Xilinx Radiation Test Consortium website as of November 2011. Xilinx's most recently announced Virtex, the Virtex 7 family, is based on a 28 nm design and is reported to deliver a two-fold system performance improvement at 50 percent lower power compared to previous generation Virtex-6 devices. In addition, Virtex-7 doubles the memory bandwidth compared to previous generation Virtex FPGAs with 1866 Mb/s memory interfacing performance and over two million logic cells. In 2011, Xilinx began shipping sample quantities the Virtex-7 2000T FPGA that packages four smaller FPGAs into a single chip by placing them on a special silicon communications pad called an interposer to deliver 6.8 billion transistors in a single large chip. The interposer provides 10,000 data pathways between the individual FPGAs roughly 10 to 100 times more than usually would be available on a board to create a single FPGA. The Virtex-6 family is built on a 40 nm process for compute-intensive electronic systems, and the company claims it consumes 15 percent less power and has 15 percent improved performance over competing 40 nm FPGAs. The Virtex-5 LX and the LXT are intended for logic-intensive applications, and the Virtex-5 SXT is for DSP applications. With the Virtex-5, Xilinx changed the logic fabric from four-input LUTs to six-input LUTs. With the increasing complexity of combinational logic functions performed by SoC, the percentage of combinational paths requiring multiple four-input LUTs became a performance and routing bottleneck. The new six-input LUT represented a tradeoff between better handling of increasingly complex combinational functions, at the expense of a reduction in the absolute number of LUTs per device. The Virtex-5 series is a 65 nm design fabricated in 1.0 V, triple-oxide process technology. Legacy Virtex devices (Virtex, Virtex-II, Virtex-II Pro, Virtex 4) are still available, but are not recommended for use in new designs.
SHARPENING
Sharpening is one of the most impressive transformations you can apply to an image since it seems to bring out image detail that was not there before. What it actually does, however, is to emphasize edges in the image and make them easier for the eye to pick out -- while the visual effect is to make the image seem sharper, no new details are actually created. Paradoxically, the first step in sharpening an image is to blur it slightly. Next, the original image and the blurred version are compared one pixel at a time. If a pixel is brighter than the blurred version it is lightened further; if a pixel is darker than the blurred version, it is darkened. The result is to increase the contrast between each pixel and its neighbors. The nature of the sharpening is influenced by the blurring radius used and the extent to which the differences between each pixel and its neighbor are exaggerated.
UNSHARP MASKING Unsharp masking is the most powerful sharpening method Picture Window supports, however it is a little more complicated to use. When you select Unsharp Masking, the Sharpen dialog box expands to add two additional sliders for Radius and Threshold. The Radius slider lets you control the amount of blurring. Generally you should set the radius to correspond to the degree to which the original image is blurred. The blurrier the image, the higher the radius you need to select. Choosing too large a radius creates a sort of ghosting effect around the edges of objects; if the radius is too small, the sharpening effect is minimized. The Threshold setting lets you restrict to sharpening action to only those pixels whose difference from their neighbors exceeds a specified threshold value. The idea behind setting the threshold value is to select a value that still brings out edge detail without creating unwanted texture in smooth areas like clouds or clear blue skies. In the image detail below, you can see how Unsharp Mask with a threshold of zero sharpens the tree silhouette, but also brings out the film grain and scanning noise in the sky area. Increasing the threshold to 20 leaves the sky mostly untouched but still makes the tree stand out against its background.
Unsharp masking (USM) is an image manipulation technique, often available in digital image processing software. The "unsharp" of the name derives from the fact that the technique uses a blurred, or "unsharp," positive to create a "mask" of the original image. [1] The unsharped mask is then combined with the negative, creating the illusion that the resulting image is sharper than the original. From a signal-processing standpoint, an unsharp mask is generally a linear or nonlinear filter that amplifies high-frequency components.
Smoothing an Image Smoothing is often used to reduce noise within an image or to produce a less pixelated image. Most smoothing methods are based on low pass filters. See Low Pass Filtering for more information. Smoothing is also usually based on a single value representing the image, such as the average value of the image or the middle (median) value. The following examples show how to smooth using average and middle values: Smoothing with Average Values Smoothing with Median Values
Vedic multiplication NIKHILAM SUTRA PRELUDE TO MULTIPLICATION To fulfill my second objective, in this column I will illustrate multiplication of two numbers using a sutra from Vedic Math called All from Nine and the last from Ten (Sanskrit - Nikhilam Navatashcaramam Dashatah). I will choose a special case to illustrate this. But, this can be expanded to any multiplication. The sutra basically means start from the left most digit and begin subtracting 9 from each of the digits; but subtract 10 from the last digit. Example 1: Let us choose the number 6. This has only one digit, so it is also the last digit. Thus applying the Nikhilam sutra we have 10 subtracted from 6 to get -4. Example 2: Given the number 87, it is clear that the first digit is 8 and the last digit is 7. Using the sutra: Subtract 9 from 8 to get -1; subtract 10 from the last digit 7 to get -3.So on the application of the Nikhilam sutra we get -13. NIKHILAM APPLICATION: MULTIPLICATION - SPECIAL CASE In the following examples I will take two numbers and illustrate how to multiply them in a very quick way using Nikhilam sutra. Even though this technique works for any pair of numbers, we will look at a special case when the numbers are near a base such as 10, 100, 1000, etc. We start with a simple example. Example 3: To multiply 8 and 7. Apply Nikhilam sutra All from nine and last from ten to the number 8 to get -2 (since there is only one digit so subtract by 10), and for the number 7 to get -3. Now write the following: 8 -2 7 -3 ___________ Multiply (-2) and (-3) to get 6 and write it down as below. 8 -2 7 -3 ________ 6_ Next we cross-add. That is add 8 and -3 to get 5 or add 7 and -2, to get 5 as shown in the picture below. Note that in either of the operations you get the same answer that is 5. 8 -2
7 3 We find the solution by combining the numbers we found by the above operations as: 8 -2 X 7 -3 __5_____6_ So the answer is 56. One interesting observation, the origin of the multiplication sign can be traced to the above cross- adding. Now you may be wondering that I knew the answer all along- big deal. Well, I used a baby problem to illustrate. I will show you that such multiplication can be done for two and higher digit multiplication. Example 4: To multiply 92 and 89. Apply Nikhilam Sutra All from nine and last from ten on both the numbers. Write this down side by side. 92 -08 X 89 -11 ___________ ___________ Multiply (-08) and (-11) to get 88. 92 -08 89 -11 ________88_ Now we cross-add. This is done by either adding 92 and -11 to get 81or adding 89 and 08. 92 -08 89 -11 Note that in both operations you get the same answer that is 81 which is written below to get the solution. 92 -08 89 -11 __81____88_ So the answer to multiplying 92 89 is 8188. Again, this technique works very well if the numbers to be multiplied are near a base. Upon slight modification this also works very well for any pair of numbers. Homework For Fun: Try the Nikhilam sutra to multiply: (i) 8598, (ii) 995988. (iii) Bonus problem 105x93. Send answers to vedicmath@hotmail.com. All correct answers will be acknowledged.
1.Introduction
High speed arithmetic operations are very important in many signal processing applications. Speed of the digital signal processor (DSP) is largely determined by the speed of its multipliers. In fact the multipliers are the most important part of all digital signal processors; they are very important in realizing many important functions such as fast Fourier transforms and convolutions. Since a processor spends considerable amount of time in performing multiplication, an improvement in multiplication speed can greatly improve system performance. Multiplication can be implemented using many algorithms such as array, booth, carry save, and Wallace tree algorithms.
The computational time required by the array multiplier is less because the partial products are computed independently in parallel. The delay associated with the array multiplier is the time taken by the signals to propagate through the gates that form the multiplication array .
Arrangement of adders is another way of improving multiplication speed. There are two methods for this: Carry save array (CSA) method and Wallace tree method. In the CSA method, bits are processed one by one to supply a carry signal to an adder located at a one bit higher position. The CSA method has got its own limitations since the execution time depends on the number of bits of the multiplier. In the Wallace tree method, three bit signals are passed to a one bit full adder and the sum is supplied to the next stage full adder of the same bit and the carry output signal is passed to the next stage full adder of same number of bit and the then formed carry is supplied to the next stage of the full adder located at a one bit higher position. In this method, the circuit lay out is not easy .
Booth algorithm reduces the number of partial products. However, large booth arrays are required for high speed multiplication and exponential operations which in turn require large partial sum and partial carry registers. Multiplication of two n-bit operands using a radix-4 booth recording multiplier requires approximately n/ (2m) clock cycles to generate the least significant half of the final product, where m is the number of booth recoded adder stages. Thus, a large propagation delay is associated with this case . The modified booth encoded Wallace tree multiplier uses modified booth algorithm to reduce the partial products and also faster additions are performed using the Wallace tree.
This paper proposes a novel fast multiplier adopting the sutra of ancient Indian Vedic mathematics called Urdhva tiryakbhyam . The design of the multiplier is faster than existing multipliers reported previously.
2.FPGA Architecture
This section describes the Xilinx field programmable logic arrays based on the architecture of Virtex-II. All Xilinx FPGA contain the same basic resources - slices (grouped into configurable logic blocks), IOBs and programmable interconnect. The other resources include memory, multipliers, global clock buffers and boundary scan logic. The architecture of Virtex - II is shown in [Figure 1]. The slices contain combinational logic and register resources. Each Virtex- II CLB contains four slices. The structure of a single slice is shown in [Figure 2]. Local routing provides feedback between slices in the same CLB, and it provides routing to neighboring CLBs. A switch matrix provides access to general routing resources. The major parts of a slice include two look-up Tables (LUTs), two sequential elements, and carry logic. The LUTs are known as the F LUT and the G LUT. The sequential elements can be programmed to be either registers or latches. The combinational logic is stored in the LUTs. The input path of the IOB element contains two DDR registers. The output path contains two DDR registers and two/three state enable DDR registers. There are separate clocks and clock enables for input and output where as the set and reset puns are shared .
Implementation with FPGA has to follow certain steps as shown in the [Figure 3].
3.Urdhva Tiryakbhyam
Urdhva tiryakbhyam is a multiplication sutra (formula) from Vedic mathematics . Vedic mathematics is an ancient Indian system of mathematics. Vedic mathematics was rediscovered by Jagadguru Swami Sri Bharati Krishna Tirthaji Maharaja. He found the basis of the system written in the form of sutras in an appendix of Atharvaveda. The method is illustrated in[Figure 4]. Figure 4 :Multiplication by Urdhva tiryakbhyam.
16 bit(4 digit) bcd vedic multiplier
The vedic multiplier to be implemented on the fpga virtex 4 board for our consideration is the 16 bit (4 digit ) multiplier. Here the inputs are grouped into 4 digits each of 4 bits represented in the bcd format. The main reason for choosing the bcd format for the representation of numbers is that this can also be used as a fixed point fractional multiplier. The vedic multiplier designed here is used solely for the purpose of multiplication of the ppixel values which are in bcd with the co efficient values. The primary module for this vedic multiplier is the 4 bit parallel array multiplier. This consists of two 4 bit inputs which are multiplied to give a 8 bit product. The product obtained will be in the binary format. Thus we require a binary to bcd converter to convert the obtained product to bcd form. The 4 bit multiplier is used to carry simultaneous multiplication of each of the cross digits as shown in the urdwatiryagbya sutram. This parallel computation of each of the products will result in the faster execution of the product. Once all the individual products have been computed , these products are added in such a way as shown in the sutram to obtain the final product. To add the partial products, we use a bcd addition module. Thus the product computed wil consist of 32 bit i.e.8 digit bcd numbers.
X X X X X X X X
X X X X X X X X
X X X X X X X X
X X X X X X X X
X X X X X X X X
X X X X X X X X
X X X X X X X X
BRAUN MULTIPLIER A binary multiplier is an electronic circuit used in digital electronics, such as a computer, to multiply two binary numbers. It is built using binary adders. A variety of computer arithmetic techniques can be used to implement a digital multiplier. Most techniques involve computing a set of partial products, and then summing the partial products together. This process is similar to the method taught to primary schoolchildren for conducting long multiplication on base-10 integers, but has been modified here for application to a base-2 (binary) numeral system.
An array multiplier is a digital combinational circuit that is used for the multiplication of two binary numbers by employing an array of full adders and half adders
Mean Filter One of the simplest linear filters is implemented by a local averaging operation where the value of each pixel is replaced by the average of all the values in the local neighborhood:
Compare this with Equation 4.6. Now if g[i,j] = 1/9 for every [i,j] in the convolution mask, the convolution operation in Equation 4.6 reduces to the local averaging operation shown above. This result shows that a mean filter can be implemented as a convolution operation with equal weights in the convolution mask (see Figure 4.6). In fact, we will see later that many imageprocessing operations can be implemented using convolution.
The size of the neighborhood N controls the amount of filtering. A larger neighborhood, corresponding to a larger convolution mask, will result in agreater degree of filtering. As a trade- off for greater amounts of noise reduction,larger filters also result in a loss of image detail. The results of meanfilters of various sizes are shown in Figure 4.7.
When designing linear smoothing filters, the filter weights should be chosen so that the filter has a single peak, called the main lobe, and symmetry in the vertical and horizontal directions. A typical pattern of weights for a 3 x 3 smoothing filter is
Linear smoothing filters remove high-frequency components, and the sharp detail in the image is lost. For example, step changes will be blurred into gradual changes, and the ability to accurately localize a change will be sacrificed. A spatially varying filter can adjust the weights so that more smoothing is done in a relatively uniform area of the image, and little smoothing is doneacross sharp changes in the image. The results of a linear smoothing filterusing the mask shown above are shown in Figure 4.8.
Median Filter The main problem with local averaging operations is that they tend to blur sharp discontinuities in intensity values in an image. An alternative approach is to replace each pixel value with the median of the gray values in the local neighborhood. Filters using this technique are called median filters. Median filters are very effective in removing salt and pepper and impulse noise while retaining image details because they do not depend on values which are significantly different from typical values in the neighborhood. Median filters work in successive image windows in a fashion similar to linear filters. However, the process is no longer a weighted sum.For example, take a 3 x 3 window and compute the median of the pixels in each window centered around [i, j]: 1. Sort the pixels into ascending order by gray level. 2. Select the value of the middle pixel as the new value for pixel [i,j]. This process is illustrated in Figure 4.9. In general, an odd-size neighborhoodis used for calculating the median. However, if the number of pixels is even, the median is taken as the average of the middle two pixels after sorting. The results of various sizes of median filters are shown in Figure 4.10.
Block diagram for Image Sharpening and Smoothing
The main objective of our project is to carry out the image sharpening and smoothening operations on a grayscale image. The following are the pre requisites to meet the purpose: Image acquisition using MATLAB Storing the image pixel values into a text file. Reading of these pixels into blockram on FPGA Obtaining a sub image (3*3 window) starting from the first pixel performing the filtering operations on each sub window. Replacing the centre pixel of the sub image window with the filtered value. Restoring these values into blockram Saving these block ram values onto a text file and display the image using these values in the text file.