Você está na página 1de 17

TCI6487 Multi-core Programming

China HPMP May 2009

NDA required

Agenda
Multi-core chips high level overview Multi-core programming
Memory consideration Inter-core communication Multi-core arbitration Peripherals consideration

Image creation
NDA required

Faraday: High Level View


C64X+ CORE
RSA L1 Data L1 Prog L1 Data L1 Prog

C64X+ CORE
RSA

C64X+ CORE
RSA L1 Data L1 Prog

THREE C64X+ DSP CORE @ 1+ GHZ 16/32 bit ISA, doubled MPY vs C64x core RSA instruction set extension for CR processing (downlink & uplink) 65 nm process MEMORY 32 kB L1 program memory 32 kB L1 data memory 3 MB of total L2 memory (2 configurations) 1MB / 1MB / 1MB or 1.5MB / 1MB / 0.5MB Boot ROM DDR2-667MHz 32-bit COMMUNICATIONS SUBSYSTEM 2x sRIO (1x links) SGMII Gigabit Ethernet Antenna interface supporting OBSAI / CPRI 6 links ACCELERATION VCP2, TCP2 Receive accelerator (RAC) 561 BALLS, 23x23 MM FC-BGA FC 5 Rows + 11x11 center array OTHERS IP security, lead-free and green

L2 MEMORY

L2 MEMORY

L2 MEMORY

EDMA3.0 WITH SWITCH FABRIC


GPIO Timers PLL Others I2C VCP2 TCP2 BootROM RAC

McBSP

Antenna Interface

DDR-2 IF

sRIO

10 / 100 / 1G Ethernet

NDA required

Agenda
Multi-core chips high level overview Multi-core programming
Memory consideration Inter-core communication Multi-core arbitration Peripherals consideration

Image creation

NDA required

Programming Considerations
Programming model
Shared image: programmer needs to determine whether aliased addressing is appropriate. If so, the code still needs to assign pointers to memory using the global address for any data transfers (aside from internal DMA performed within a single cores memory). Non-shared images: Only global addresses should be used. There is no advantage to aliased addressing.

Resource allocation Shared resources must be partitioned or arbitrated for.


Multi-channel peripherals can be split amongst the cores for concurrent, orthogonal control EDMA channels, EDMA events, Ethernet MAC TX/RX data flow, RapidIO TX/RX/LSU dataflow. Single-channel peripherals ought to be controlled by a single master, servicing the other cores if needed Timer64

System-level prioritization. A user-specified priority may be assigned to:


Any cache-miss or non-cacheable accesses by any of the CPUs Any EDMA transfer Any Serial RapidIO transfer Any Ethernet transfer

Inter-core communication
Discrete events: INTGEN peripheral Message passing: Direct writes to memory, or DMA transfer. Can implement a polling or interrupt-driven protocol (DSP BIOS MSGQ available).
NDA required

Core Local Memory Map


For each core, L1/L2 memories have two entries in the memory map.
Global addresses: accessible to all masters in the chip Local (aliased) addresses: accessible only to the local core and IDMA
The eight most significant bits are masked to zero
E.g. 0x10800000 and 0x00800000 are the same memory for core 0.

Allows for common code to be run unmodified on multiple cores Not beneficial for un-shared code.

Each core has a private configuration space


Local core control registers (cache, TSC, IDMA, INTC) are not visible to other masters in the chip.

Core number
software can verify the core on which it is running through register (DNUM) that holds the DSP core number (0, 1, or 2) The core number can be used during run-time to conditionally execute code, update pointers, create a global address, etc.
NDA required

Basic Techniques for Multi-core DSP


Inter-core interrupts
Corporation between cores

EDMA
Main inter-core data transaction engine

Shared memory

NDA required

* Blue parts are necessary for multi-core DSP

Inter-core data transaction


Discrete events: INTGEN peripheral Message passing: Direct writes to memory, or DMA transfer. Can implement a polling or interrupt-driven protocol (DSP BIOS MSGQ available).

NDA required

Inter-DSP Interrupts

2 Registers per core to control Inter-DSP Interrupts


IPCG (In IPCGRx)
Write 1 to IPCG triggers an interrupt to corresponding GEM Any 1 write within 8 CPU cycles does not trigger a new interrupt Write 0 and Reads have no effect SW method to tell what caused the interrupt Usage is completely SW defined Write of 1 is sticky and is read back as 1 until cleared. Write of 0 has no effect Reads return the current value of the bit Write of 1 clears SRCSx in IPCARx Write of 0 or read has no effect

SRCSx (In IPCGRx)


SRCCx (In IPCARx)


NDA required

Multi-Channel Peripherals
These peripherals allow resources to be allocated to the cores and orthogonally controlled without the software hand-shaking prior to accesses. Examples to these multi-channel peripherals are:
EDMA
64 Channels and 256 Parameter RAM can be separated by software into Regions, with each region

assigned to a core.

EMAC
Eight receive and eight transmit DMA channels assigned by software. Received packets transferred to a core based on MAC address routing assigned to a channel. Transmit packets transferred from a core based on a core defined list.

SRIO
Eight receive and eight transmit DMA channels assigned by software. Received packets transferred to a core based on address routing assigned to a channel. Transmit packets transferred from a core based on a core defined list.

AIF
Six inbound and outbound links, the multi EDMA channels assigned by software.

INTGEN
The interrupt Generation logic, used for discrete signaling between cores, is designed to allow orthogonal event assertions and clearing by each core. Control registers are established per receiver and multiple senders can assert events concurrently.

GPIO
multi GPIO can be separated by software. NDA required

Single-Channel Peripherals
I2C
Typically used during boot, system setup, or board monitoring, the I2C should be serviced by a single core. If shared tables/resources are accessed through I2C it would be much faster to first copy the data to DSP memory and share from there. The I2C can be serviced by direct CPU accesses or EDMA.

Timer64
There are multiple timers on the chip. Typically these are individually allocated to single cores, allowing the owning core to control it without arbitrating.

All other peripherals


Be intended for use during system initialization only, and as such do not need to be allocated or arbitrated for. The boot master should take care of this initialization. This includes DDR2, which has built-in arbitration for multiple masters based on transaction priority
NDA required

Agenda
Multi-core chips high level overview Multi-core programming
Memory consideration Inter-core communication Multi-core arbitration Peripherals consideration

Image creation

NDA required

Single Code Image

C64x+ Core 0 C64x+ Core 1 C64x+ Core 2

Default configuration of chip will be for single image. BIOS code and read-only data should be placed into shared memory.
.hwi_vec will default to LL2 memory (it can be modified during runtime). The sections .gblinit, .switch, .cinit, .pinit, and .const will default to shared memory. All other data sections will default to L2 memory. User can load and run the app on all cores synchronously with parallel debug manager (Simulator). User can also load and run app on each individual core (Simulator). Sections located in aliased memory will automatically be replicated across the cores memory. When done loading app, it can release all cores from reset.

L1 Data

L1 Data

L1 Data

L1 Prog

L1 Prog

L1 Prog

If using CCS

App.out

App.out

App.out

L2 memory L2 memory L2 memory

If using Bootloader

code and read-only data

App.out

DDR2 memory

NDA required

Multiple images, not shared


C64x+ C64x+ Core 0 0 Core C64x+ C64x+ Core 1 1 Core C64x+ C64x+ Core 2 2 Core

L1 Data L1 Data

L1 Data L1 Data

L1 Data L1 Data

L1 Prog L1 Prog

L1 Prog L1 Prog

L1 Prog L1 Prog

Each core will be loaded with its app. Each app needs to manage its usage of memory and make sure it doesnt collide with any other app. If using CCS
Open and load each core with its app (Simulator). Use Parallel Debug Manager to run all cores synchronously or open up each core to run them asynchronously (Simulator).

App2.out
App0.out App1.out

L2 memory

L2 memory

L2 memory

App0.out

If using Bootloader
DDR2 memory App1.out App2.out

Load each core with its app Take each core out of reset

NDA required

Multiple images, shared

C64x+ Core 0 C64x+ Core 1 C64x+ Core 2

All apps share some common code/data (partial link image).


partial link image needs to be build as a separate step. partial link image is at a fixed location on all cores. Code and read-only data should be placed into shared memory. Some BIOS read/write data will need to be placed in each cores L2 memory. Should be placed in each cores LL2 memory. Each app can use SL2 memory, but needs to manage its usage of the SL2 memory and make sure it doesnt collide with any other app. Load the partial link image first through Parallel Debug Manager (Simulator). [Only needed if not loaded with app]. Load each core with its app (Simulator). Use Parallel Debug Manager to run all cores synchronously or open up each core to run them asynchronously (Simulator).

L1 Data

L1 Data

L1 Data

L1 Prog L1 Prog L1 Prog

The non-shared code and data.


Partial Image data App0.out L2 memory

Partial Image data

Partial Image data

If using CCS

L2 memory

L2 memory

BIOS and/or App

Partial Image code & readonly data DDR memory

App2.out

If using Bootloader
Load the partial link image (if not loaded with app). Load each core with its app. Release each core from reset. Note: The partial link image could be loaded once if not included in the load of the apps otherwise it would be loaded multiple times (once for each app loaded on each core).

App1.out

NDA required

Device Boot
Regardless of the number of .out files created, a single boot table should be generated for the final image to be loaded in the end system. The boot sequence is controlled by Core 0.
After device reset, Core 0 is responsible for releasing all cores from reset after the boot image is loaded into the device.

Details on the boot loader are available in TI user guide SPRUEA7, TMS320TCI648x DSP Bootloader
Core0.out Core0.rmd Core0.btbl M E R G E B T B L

Hex6x

Core1.out Core1.rmd Hex6x

Core1.btbl

DspCode.btbl

Core2.out Hex6x Core2.rmd

Core2.btbl

NDA required

Q &A
NDA required

Você também pode gostar