Você está na página 1de 51

Analog Devices SHARC

CS 433 Processor Presentation Series Prof. Luddy Harrison

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

Note on this presentation series


These slide presentations were prepared by students of CS433 at the University of Illinois at Urbana-Champaign All the drawings and figures in these slides were drawn by the students. Some drawings are based on figures in the manufacturers documentation for the processor, but none are electronic copies of such drawings You are free to use these slides provided that you leave the credits and copyright notices intact
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 2

Overview
Processor History Physical packaging Data paths, register files, computational units Pipelining, timing information Memory Instruction Set Architecture (ISA) Applications targeted Systems employing the SHARC

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

SHARC Features
Super Harvard ARChitecture
Unique CISC architecture allows simultaneous fetch of two operands and an instruction in one cycle

Combines high performance DSP core with integrated, on-chip system features
Dual-ported (processor and I/O) SRAM DMA Controller

Selective Instruction Cache


Cache only those instructions whose fetches conflict with program memory data accesses
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 4

SHARC Processor History


ADSP-2106x (2000)
Single computational units based on predecessor ADSP-2100 Family 40 MHz core

ADSP-2116x (2001)
SIMD (Single-Issue Multiple-Data) dual computational unit architecture added 150-200 MHz core, 1-2 MB RAM

ADSP-2126x, ADSP-2136x (2003 Future)


Integrated audio-centric peripherals (128-140db Sample Rate Conversion) added 333-400 MHz core, 2-3 MB RAM
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 5

ADSP-2106x Overview
CORE PROCESSOR
TIMER INSTRUCTION CACHE

DUAL-PORTED SRAM
BLCOK 0 BLCOK 1 TWO INDEPENDENT DUAL-PORTED BLOCKS PROCESSOR PORT I/O PORT

DAG1

DAG2

PROGRAM SEQUENCER PM ADDRESS BUS DM ADDRESS BUS

EXTERNAL PORT
ADDR BUS MUX MULTIPROC INTERFACE DATA BUS MUX HOST PORT

PM DATA BUS BUS CONNECT (PX) DM DATA BUS

DATA REGISTER FILE MULTIPLIER 16x40-BIT BARREL SHIFTER ALU

IOP REGISTERS CONTROL, STATUS & DATA BUFFERS

DMA CONTROLLER SERIAL PORTS (2) LINK PORTS (6)

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

I/O PROCESSOR

ADSP-2106x Core
Computational Units
ALU, Multiplier, and Shifter can all perform independent operations in a single cycle

Register File
Two sets (primary and alternate) of 16 registers, each 40-bits wide

Program Sequencer and Data Address Generators


Allows computational units to operate independent of instruction fetch and program counter increment
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 7

ADSP-2106x Packaging
ADSP-2106x
1x CLOCK CONTROL ADDRESS CLKIN EBOOT LBOOT IRQ FLAG TIMEXP LxCLK LxACK LxDAT TCLK0 RCLK0 TFS0 RFS0 DT0 DR0 TCLK1 RCLK1 TFS1 RFS1 DT1 DR1 RPBA ID2-0 BMS CS ADDR DATA BOOT EPROM

ADDR31-0 DATA47-0 RD WR ACK MS3-0 PAGE SBTS SW ADRCLK DMAR1-2 DMAG1-2 CS HBR HBG REDY BR1-6 CPA

DATA

LINK DEVICES

ADDR DATA OE WE ACK CS

MEMORY & PERIPHERALS

SERIAL DEVICE

DMA DEVICE DATA

SERIAL DEVICE

HOST PROCESSOR INTERFACE ADDR DATA

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

ADSP-2106x Key Pins


PIN
ADDR31-0 DATA47-0 MS3-0 PAGE DMAR(1-2) IRQ2-0

FUNCTION
External Bus Address External Bus Data Memory Select Lines DRAM Page Boundary DMA Request 1 and 2 Interrupt Request Lines

NOTE

Asserted (low) as chip selects memory bank

Asserted if a page boundary is crossed

Edge-triggered or level-sensitive

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

ADSP-2106x Registers
Data Registers
R15 R0 (fixed-point), F15 F0 (floating-point)

Program Sequencer
PC (program counter), PCSTKP (PC stack pointer), FADDR (fetch address), etc.

Data Address Generator


I7 I0 (DAG1 index), M7 M0 (DAG1 modify) L7 L0 (DAG1 length), B7 B0 (DAG1 base)

Bus Exchange, Timer, and System Registers

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

10

ADSP-2106x Buses
Address
Program Memory Address 24 bits wide Data Memory Address 32 bits wide

Data
Program Memory Data 48 bits wide Stores instructions and data for dual-fetches Data Memory Data 40 bits wide Stores data operands

One PM Data bus and/or one DM Data bus register file access per cycle

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

11

ADSP-2106x I/O
Serial Ports
Operate at clock rate of processor

DMA
Port data can be automatically transferred to and from on-chip memory

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

12

ADSP-2106x DMA
I/O port block transfers (link/serial) External memory block transfers DMA Channel setup by writing memory buffer parameters to DMA parameter registers
Starting Address for Buffer Address Modifier Word Count

Interrupt generated when transfer completes (i.e. Word Count = 0)

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

13

ADSP-2106x DMA Registers


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 FS EXT. PORT FIFO FLSH FLUSH EXT. PORT FIFO EXTERN EXT. DEVICES TO EXT. MEM. INTIO SINGLE-WORD INTERRUPTS HSHAKE DMA HANDSHAKE MASTER DMA MASTER MODE MSWF MOST SIGNIFICANT WORD FIRST 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DEN DMA ENABLE CHEN DMA CHAINING ENABLE TRAN DMA CHANNEL DIRECTION PS PACKING STATUS

DTYPE DATA TYPE PMODE PACKING MODE

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

14

ADSP-2106x Pipelining
Three phases
Fetch
Read from cache or program memory

Decode
Generate conditions for instruction

Execute
Operations specified by instruction completed

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

15

ADSP-2106x Branching and Pipelining


Branches
Delayed Two instructions after branch are executed Non-delayed Program sequencer suppresses instruction execution for next two instructions

CLOCK CYCLES Fetch Decode Execute n+2 n+1 n no-op j n+2 n+1 no-op j+1 j n+2 j+2 j+1 j

Non-delayed

Delayed

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

16

ADSP-2106x Memory
On-Chip SRAM Total Size ADSP-21060 500KB ADSP-21062 250KB ADSP-21061 125KB

On-chip support for:


48-bit instructions (and 40-bit extended precision floatingpoint data) 32-bit floating-point data 16-bit short word data

Off-chip memory up to 4 GB

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

17

ADSP-2106x Memory (2)


IOP REGISTERS RESERVED ADDRESS SPACE 0x0001 FFFF 0x0002 0000 0x0004 0000 0x0000 0000 0x0000 0100

These represent the same physical memory

BLOCK 0 0x0003 0000 BLOCK 0 BLOCK 1 0x0003 FFFF

NORMAL WORD ADDRESSING 128K x 32-bit words 80K x 40-bit words


CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois

0x0006 0000

BLOCK 1

0x0007 FFFF

18

ADSP-2106x Memory (3)


Memory divided into blocks Dual-ported (PM and DM bus share one port, I/O bus uses the other)
Allows independent access by processor core and I/O processor Each block can be accessed by both in every cycle

Typical DSP applications (digital filters, FFTs, etc.) access two operands at once, such as a filter coefficient and a data sample, so allowing singlecycle execution is a must

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

19

ADSP-2106x Shadow Write


Due to need for high-speed operations, memory writes to a two-deep FIFO On write, data in FIFO from previous write is loaded to memory and new data enters FIFO Reads of last two written locations are intercepted and re-routed to the FIFO

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

20

ADSP-2106x Instruction Cache


Sequencer checks instruction cache on every program memory data access Allows PM bus to be used for data fetches instead of being tied up with an instruction fetch When fetch conflict first occurs, instruction is cached to prevent the same delay from happening again

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

21

ADSP-2106x Instruction Cache (2)


LRU BIT VALID SET 0 ENTRY 0 ENTRY 1 INSTRUCTIONS ADDRESSES (BITS 23-4) ADDRESSES (BITS 3-0) 0000

SET 1

ENTRY 0 ENTRY 1

0001

SET 14

ENTRY 0 ENTRY 1

1110

SET 15

ENTRY 0 ENTRY 1

1111

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

22

ADSP-2106x ISA Overview


24 operations, although some have more than one syntactical form Instruction Types
Compute and Move Compute operation in parallel with data moves or index register modify Program Flow Control Branch, Call, Return, Loop Immediate Data Move Operand or addressing immediate fields Miscellaneous Bit Modify and Test, No-op, etc.
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 23

ADSP-2106x ISA Compute and Move


Instructions follow the format IF condition op1, op2; IF and condition are optional op1 is an optional compute instruction op2 is an optional data move instruction

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

24

ADSP-2106x ISA Compute Examples


Single function
F6 = (F2 + F3);

Multi-function
Distinct parallel operations supported Parallel computations and data transfers
R1 = R2 * R6, M4 = R0;

Simultaneous multiplier and ALU operations


R1 = R2 * R6, F6 = F2 + F3;
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 25

ADSP-2106x ISA Single function Compute


22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

CU

OPCODE

RN

RX

RY

CU specifies
00 ALU 01 Multiplier 02 Shifter

OPCODE indicates operation type (add, subtract, etc.) RN specifies result register RX and RY specify operand registers
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 26

ADSP-2106x ISA Multi-function Compute


Parallel ALU and Multiplier operations
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

OPCODE

RM

RA

RXM

RYM

RXA

RYA

Registers restricted to particular sets Multiplier X: R3 R0, Y: R7 R4 ALU X: R11 R8, Y: R15 R12 OPCODE specifies ALU op, for example: 000100: Rm = R3-0 * R7-4, Ra = R11-8 + R15-12; 011111: Rm = R3-0 * R7-4, Ra = MIN(R11-8, R1512);
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 27

ADSP-2106x ISA Program Flow Control


Instructions follow the format
IF condition JUMP/CALL, ELSE op2;

IF, condition, ELSE are optional JUMP/CALL is a JUMP or CALL instruction op2 is an optional compute instruction

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

28

ADSP-2106x ISA Program Flow Control (2)


Instructions follow the format
DO <addr24> UNTIL termination;

No optional fields <addr24> is the loop start address termination is the loop ending condition to check after each iteration

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

29

ADSP-2106x ISA Program Flow Examples


Conditional Execution
IF GT R1 = R2 * R6; IF NE JUMP label2;

Also used for Call/Return


main: CALL routine; routine: ... RTS; /*return to main*/

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

30

ADSP-2106x ISA Immediate Data Move


Instructions follow the format ureg = <data32>; DM(<data32>, Ia) = ureg; PM(<data24>, Ia) = ureg; Ia is an optional indirect addressor DM is a 32-bit data memory address PM is a 24-bit program memory address
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 31

ADSP-2106x ISA Addressing Examples


Direct
JUMP <data24>;

Relative to Program Counter


JUMP (PC, <data24>);

Register Indirect (using DAG registers)


Pre-Modify (modification pre-address calculation) JUMP (M0, I0); Post-Modify (modification post-address calculation) JUMP (I0, M0);
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 32

ADSP-2116x Overview
Extension of 2106x, adding 150Mhz core and SIMD (SingleIssue Multiple-Data) support via dual hardware
DIFFERENT DATA GOES TO EACH ELEMENT
BUS CONNECT PM DATA BUS DM DATA BUS

MULT

DATA REGISTER FILE

BARREL SHIFTER

BARREL SHIFTER

DATA REGISTER FILE

MULT

ALU

SAME INSTRUCTION GOES TO BOTH ELEMENTS


PROGRAM SEQUENCER CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 33

ADSP-2116x SIMD Engine


Dual hardware allows same instruction to be executed across different data
2 ALUs, multipliers, shifters, register files Two data values transferred with each memory or register file access Very effective for stereo channel processing

Can effectively double performance over similar algorithms running on ADSP-2106x processors

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

34

ADSP-2116x SIMD Engine (2)


Enabled/disabled via MODE1 bit
When disabled, processor simply acts in SISD mode

Program sequencer must be aware of status flags set by each set of hardware elements Conditional compute operations can be specified on both, either, or neither hardware set Conditional branches and loops executed by ANDing the condition tests on both hardware sets

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

35

ADSP-2116x SIMD Engine (3)


Instruction
Rx = Ry;

Mode
SISD SIMD

Transfer 1
Rx loaded from Ry Rx loaded from Ry Sx loaded from Sy Sx loaded from Sy Rx loaded from Sy Rx loaded from Sy Sx loaded from Ry Sx loaded from Ry

Transfer 2
n/a Sx loaded from Sy n/a Rx loaded from Ry n/a Sx loaded from Ry n/a Rx loaded from Sy

Sx = Sy;

SISD SIMD

Rx = Sy;

SISD SIMD

Sx = Ry;

SISD SIMD

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

36

ADSP-2126x Overview
Direct extension of 2116x, instructions are fully backward compatible Core increased to 150-200 MHz w/ 1MB SRAM Data buses increased from 32 to 64 bits Synchronous, independent serial ports increased from 2 to 6 ROM-based security
Prevents piracy of code and algorithms Prevents peripheral devices from reading on-chip memory

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

37

ADSP-2136x Overview
CORE PROCESSOR TIMER INSTRUCTION CACHE BLOCK 0 SRAM 1M BIT ROM 2M BIT
ADDR DATA

4 BLOCKS ON-CHIP MEMORY BLOCK 1 BLOCK 2 SRAM 1M BIT ROM 2M BIT


ADDR DATA ADDR

BLOCK 3 SRAM 0.5M BIT

SRAM 0.5M BIT

DAG1

DAG2

PROGRAM SEQUENCER

DATA

ADDR

DATA

PM ADDRESS BUS DM ADDRESS BUS PM DATA BUS DM DATA BUS

PROCESSING PROCESSING PX REGISTER ELEMENT ELEMENT (PEX) (PEY)

IOP REGISTERS

SPI SPORTS IDP POG TIMERS SRC SPDIF

SIGNAL ROUTING UNIT

I/O PROCESSOR AND PERIPHEALS

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

38

ADSP-2136x Overview (2)


Direct extension of 2126x, instructions are fully backward compatible On-chip memory expanded from 2 to 4 blocks Digital Audio Interface (DAI) set of audio peripherals
Interrupt controller, interface data port, signal routing unit, clock generators, and timers Different units contain S/PDIF receiver/transmitter, sample rate converters, or DTCP encrypting engine
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 39

SHARC Benchmarks
Algorithm benchmarks supplied by manufacturer:
2106x
Clock Cycle Instruction Cycle Time MFLOPS Sustained MFLOPS Peak FIR Filter (per tap) IIR Filter (per biquad) Divide (y/x)

2116x
100 MHz 10 ns 400 MFLOPS 600 MFLOPS 5 ns 20 ns 30 ns

2126x
200 MHz 6.67 ns 600 MFLOPS 900 MFLOPS 2.5 ns 10 ns 20 ns

2136x
333 MHz 3 ns 1332 MFLOPS 1998 MFLOPS 1.5 ns 6 ns 9 ns

66 MHz 15 ns 132 MFLOPS 198 MFLOPS 15 ns 61 ns 91 ns

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

40

Applications Targeted
SHARC designed to
Simplify Development Speed time to Market Reduce Product Costs

Product targeted
A/V Receivers 7.1 Surround Sound Decoding Mixing Consoles Digital Synthesizers Automobiles

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

41

Systems Employing the SHARC


SRS Circle Surround II Melody (w/ Auto Room Tuner) Metric Halo's Portable Pro Audio Hub Alacron FT-P5

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

42

SHARC in SRS Circle Surround II


Full multi-channel surround sound from simple right/left stereo sound Encoding can be transmitted over standard stereo medium (broadcast television, radio, etc.) and maintains full backward compatibility

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

43

SHARC in SRS Circle Surround II (2)


Output from each source is combined in constant phase filter banks and encoded in quadrature to prevent signal cancellation Positional bias generator analyzes ratios between left and right surround signals which multipliers then apply to the opposing left or right output Decoder uses this level imbalance to direct the surround information to the correct output

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

44

SHARC Melody
Optimized Surround Sound for the Mass Market Core of high-fidelity audio decoders in Denon, Bose, and Kenwood products Auto Room Tuner (ART) integrated software simplifies setup of complex audio systems
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 45

SHARC Melody ART


Automatically measures and corrects multichannel sound system for rooms acoustics Corrects system deficiencies For each speaker, ART calculates:
Sound pressure level (SPL) Distance of each speaker from listener Frequency response

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

46

SHARC in Metric Halo's Portable Pro Audio Hub


Portable FireWire-based recording device, used in live recordings applications by motion pictures and major recording artists like No Doubt and Dave Mathews Band Serial ports used to interface to digital and mixedsignal peripheral devices Initially implemented on SHARC ADSP-2106x, later upgraded to ADSP-2126x Future hybrid implementation will use a ADSP2106x for FireWire processing coupled with a ADSP-2126x for audio processing

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

47

SHARC in Alacron FT-P5


COTS (Commercial Off-The-Shelf) system for use in distributed, compute intensive, high data rate applications in commercial and military industries Supports 1 to 96 ADSP-2106x processors Makes extensive use of SHARCs DMA through external PMC interface, supporting full-duplex communication in excess of 1 GB/sec
In-cabinet SAN clusters Compute nodes in distributed systems

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

48

SHARC vs. RISC Processors


RISC is...
Less costly to design, test, and manufacture, since processors are less specialized

But...
Parallel (stereo) computation requires two or more interconnected processors accessing shared memory Less performance

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

49

Conclusion
SHARC offers great deal of computational power, with on-chip SRAM and SIMD architecture Variety of applications (especially audio processing) exploit it

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

50

Citations
Processor details taken from product manuals and descriptions at http://www.analog.com

CS 433 Prof. Luddy Harrison

Copyright 2005 University of Illinois

51

Você também pode gostar