Escolar Documentos
Profissional Documentos
Cultura Documentos
ABSTRACT
The Tiger SHARC processor is the newest and most power member of this
family which incorporates many mechanisms like SIMD, VLIW and short vector
memory access in a single processor. This is the first time that all these have been
combined in a real time processor.
The unique architecture combines elements of RISC, VLIW and standard DSP
processors to provide native support for 8, 16,and 32-bit fixed, as well as floating
point data types on single chip. Large on-chip memory, extremely high internal and
external bandwidths and dual compute blocks provide the necessary capabilities to
handle a vast array of computationally demanding, large signal processing tasks.
Contents
1. INTRODUCTION
4. APPLICATIONS
5. ADVANTAGES
6. CONCLUSION
7. REFERANCES
1. INTRODUCTION
1.1 Analog and digital signals
In many cases, the signal of interest is initially in the form of an analog
electrical voltage or current, produced for example by a microphone or some other
type of transducer. An analog signal must be converted into digital form before DSP
techniques can be applied. An analog electrical voltage signal, for example, can be
digitized using an electronic circuit called an analog-to-digital converter or ADC.
This generates a digital output as a stream of binary numbers whose values represent
the electrical voltage input to the device at each sampling instant.
The development of digital signal processing dates from the 1960's with the
use of mainframe digital computers for number-crunching applications such as the
Fast Fourier Transform (FFT), which allows the frequency spectrum of a signal to be
computed rapidly. These techniques were not widely used at that time, because
suitable computing equipment was generally available only in universities and other
scientific research institutions.
This leads us to the Harvard architecture, shown in (b). This is named for the
work done at Harvard University in the 1940s under the leadership of Howard Aiken
(1900-1973). As shown in this illustration, Aiken insisted on separate memories for
data and program instructions, with separate buses for each. Since the buses operate
independently, program instructions and data can be fetched at the same time,
improving the speed over the single bus design. Most present day DSPs use this dual
bus architecture.
Figure (c) illustrates the next level of sophistication, the Super Harvard
Architecture. This term was coined by Analog Devices to describe the internal
operation of their ADSP-2106x and new ADSP-211xx families of Digital Signal
Processors. These are called SHARC® DSPs, a contraction of the longer term, Super
Harvard ARChitecture. The idea is to build upon the Harvard architecture by adding
features to improve the throughput. While the SHARC DSPs are optimized in
dozens of ways, two areas are important enough to be included in Fig. (c): an
instruction cache, and an I/O controller.
A handicap of the basic Harvard design is that the data memory bus is
busier than the program memory bus. When two numbers are multiplied, two binary
values (the numbers) must be passed over the data memory bus, while only one
binary value (the program instruction) is passed over the program memory bus. To
improve upon this situation, we start by relocating part of the "data" to program
memory. For instance, we might place the filter coefficients in program memory,
while keeping the input signal in data memory. (This relocated data is called
"secondary data" in the illustration). At first glance, this doesn't seem to help the
situation; now we must transfer one value over the data memory bus (the input signal
sample), but two values over the program memory bus (the program instruction and
the coefficient). In fact, if we were executing random instructions, this situation
would be no better at all.
Some DSP algorithms are best carried out in stages. For instance, IIR filters
are more stable if implemented as a cascade of biquads (a stage containing two poles
and up to two zeros). Multiple stages require multiple circular buffers for the fastest
operation. The DAGs in the SHARC DSPs are also designed to efficiently carry out
the Fast Fourier transform. In this mode, the DAGs are configured to generate bit-
reversed addresses into the circular buffers, a necessary part of the FFT algorithm. In
addition, an abundance of circular buffers greatly simplifies DSP code generation-
both for the human programmer as well as high-level language compilers, such as C.
The data register section of the CPU is used in the same way as in
traditional microprocessors. In the ADSP-2106x SHARC DSPs, there are 16 general
purpose registers of 40 bits each. These can hold intermediate calculations, prepare
data for the math processor, serve as a buffer for data transfer, hold flags for program
control, and so on. If needed, these registers can also be used to control loops and
counters; however, the SHARC DSPs have extra hardware registers to carry out
many of these functions.
The math processing is broken into three sections, a multiplier, an
arithmetic logic unit (ALU), and a barrel shifter. The multiplier takes the values
from two registers, multiplies them, and places the result into another register. The
ALU performs addition, subtraction, absolute value, logical operations (AND, OR,
XOR, NOT), conversion between fixed and floating point formats, and similar
functions. Elementary binary operations are carried out by the barrel shifter, such as
shifting, rotating, extracting and depositing segments, and so on. A powerful feature
of the SHARC family is that the multiplier and the ALU can be accessed in parallel.
In a single clock cycle, data from registers 0-7 can be passed to the multiplier, data
from registers 8-15 can be passed to the ALU, and the two results returned to any of
the 16 registers.
There are also many important features of the SHARC family architecture
that aren't shown in this simplified illustration. For instance, an 80 bit accumulator is
built into the multiplier to reduce the round-off error associated with multiple fixed-
point math operations. Another interesting feature is the use of shadow registers for
all the CPU's key registers. These are duplicate registers that can be switched with
their counterparts in a single clock cycle. They are used for fast context switching,
the ability to handle interrupts quickly. When an interrupt occurs in traditional
microprocessors, all the internal data must be saved before the interrupt can be
handled. This usually involves pushing all of the occupied registers onto the stack,
one at a time. In comparison, an interrupt in the SHARC family is handled by
moving the internal data into the shadow registers in a single clock cycle. When the
interrupt routine is completed, the registers are just as quickly restored. This feature
allows step 4 on our list (managing the sample-ready interrupt) to be handled very
quickly and efficiently.
SHARC has 32/42 bit floating and fixed point core.DMA controller and
duel ported SRAM to move data into and out of memory without wasting core
cycles. It has high performance computation unit. It has four bus performances. They
include fetch next instruction, access 2 data values, performs DMA for I/O device.
3. The TigerSHARC Processor
3.3 DESCRIPTION
and dual computation blocks. Each block contains a multiplier, an ALU, and
a 64 –bit shifter and can perform one 32-*32 bit or four 16-*16-bit multiply –
accumulates (MAC) per cycle.
On-chip memory is divided into three banks: one for soft-ware and
two for data. ADI will not disclose the amount of on-chip memory in the first
TigerSHARC devices, but we expect that the vendor will continue to be generous
with on-chip memory; the predecessor SHARC and Hammerhead devices include
68K to 512K of on-chip memory. When moving 64-bit or 128-bit data,
TigerSHARC transfers data from consecutive memory locations to consecutive data
registers, or vice versa. The smallest amount of data that can be transferred is 32 bits.
If TigerSHARC programs use word sizes of 8 or 16 bits in a DSP algorithm, they
cannot access individual words; any load or store will transfer at least four 8-bit or
two 16-bit words. The chip includes a data alignment buffer and a short data
alignment buffer that allow 64 or 128 bits of data to be transferred from (but not to)
any memory location aligned on a 16-bit word boundary. TigerSHARC provides
more flexibility than most processors with SIMD features, which often require that
data be aligned at memory locations divisible by the size of the data transfer.
The chip includes a data alignment buffer and a short data alignment buffer
that allow 64 or 128 bits of data to be transferred from (but not to be) any memory
location aligned on a 16-bit word boundary.TigerSHARC provides more flexibility
than most processors with SIMD features, which often require that data be aligned at
memory locations divisible by the size of the data transfer.
The TigerSHARC's unique ability to process 1-, 8-, 16- and 32-bit fixed-point as
well as floating-point data types on a single chip allows original equipment
manufacturers to adapt to evolving telecommunications standards without
encountering the limitations of traditional hardware approaches .Having the highest
performance DSP for communications infrastructure and multiprocessing
applications available, TigerSHARC allows wireless infrastructure manufacturers to
continue evolving their design to meet the needs of their target system, while
deploying a highly optimized and effective Node B solution that will realize
significant overall cost savings.
The 32-bit word, multi-ported register files are used for transferring data
between the computational units and data buses, and for storing intermediate results.
Instructions can access the registers in the register file individually (word-aligned) or
in sets of two (dual-aligned) or four (quad-aligned). The ALU performs a standard
set of arithmetic operations in both fixed-point and floating-point formats, while also
performing logic operations. The multiplier performs both fixed-point and floating-
point multiplication as well as fixed-point multiply and accumulates. The 64-bit
shifter performs logical and arithmetic shifts, bit and bit-stream manipulation, and
field deposit and extraction.
The TigerSHARC Processor has two integer ALUs (IALUs) that provide
powerful address generation capabilities and perform many general-purpose integer
operations. Each IALU has a multi-ported 31-word register file. As address
generators, the IALUs perform immediate or indirect (pre- and post-modify)
addressing. They perform modulus and bit-reverse operations with no constraints
placed on memory addresses for data buffer placement. Each IALU can specify
either a single, dual- or quad- word access from memory.
The large on-chip memory is divided into three separate blocks of equal size. Each
block is 128-bits wide, offering the quad word structure and four addresses for every
row. For data accesses, the processor can address one 32-bit word or two 32-bit
words (long) or four 32-bit words (quad) and transfer it to/from a single
computational unit or to both in a single processor cycle. The user only has to care
that the start addresses are either modulo two or modulo four addresses when
fetching long words and quad words. In applications that require computing data of a
delay line in which the start address of the variable does not match the modulo
requirements, or in other applications that require unaligned data fetches a data
alignment buffer (DAB) is provided. Once the DAB is filled, quad word fetches can
be made from it.Besides the internal memory, the TigerSHARC can access up to
four giga words of memory. The memory map is given in Figure
3.5.9 Program Sequencer
The DMA controller performs routine functions such as external port block
transfers, link port transfers and AutoDMA transfers as well as additional features
such as Flyby transfers, DMA chaining and Two-dimensional transfers.
The ADSP-TS201S and ADSP-TS202S have four full-duplex link ports each
providing four-bit receive and four-bit transmit I/O capability, using Low-Voltage,
Differential-Signal (LVDS) technology. With the ability to operate at a double data
rate running at 500 MHz, each link can support up to 500 Mbytes per second per
direction, for a combined maximum throughput of 4 Gbytes per second.
The ADSP-TS203S has two full-duplex link ports each providing four-bit
receive and four-bit transmit I/O capability, using Low-Voltage, Differential-Signal
(LVDS) technology. With the ability to operate at a double data rate running at 250
MHz, each link can support up to 500 Mbytes per second per direction, for a
combined maximum throughput of 4 Gbytes per second.
Each Link Port has its own triple-buffered quad-word input and double-
buffered quad-word output registers. The DSP's core can write directly to a Link
Port's transmit register and read from a receive register, or the DMA controller can
perform DMA transfers through eight dedicated Link Port DMA channels.
"TigerSHARC's exceptional speed and functionality are suited for applications in:
Defense - sonar, radar, digital maps, munitions guidance
Medical - ultrasound, CT scanners, MRI, digital X-ray
Industrial systems - data acquisition, control, test, and inspection systems
Video processing - editing, printers, copiers
Wireless Infrastructure - GSM, EDGE, and 3G cellular base stations."
6. Conclusion
7. References
• www.analog.com/processors/tigersharc
• www.analog.com/processors/teaching Resources
• www.ener.ucalgory.co/People/Smith/ECE-ADI-PROJECT
• www.answers.com