Escolar Documentos
Profissional Documentos
Cultura Documentos
Architecture and
Implementations
Benson Inkley
Desktop Processor PAE Manager
Scott Tetrick
Principal Engineer
1
Legal Disclaimer
y INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL INTEL® PRODUCTS. NO LICENSE, EXPRESS OR
IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT
AS PROVIDED IN INTEL’
INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY
WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL
PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR FOR A PARTICULAR PURPOSE, MERCHANTABILITY,
OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY PROPERTY RIGHT. Intel products are not intended
for use in medical, life saving, or life sustaining applications.
applications.
y Intel may make changes to specifications and product descriptions
descriptions at any time, without notice.
y Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel Intel
reserves these for future definition and shall have no responsibility
responsibility whatsoever for conflicts or incompatibilities arising from
future changes to them.
y The Intel®
Intel® processors mentioned may contain design defects or errors known as errata which may cause the product to deviate
from published specifications. Current characterized errata are available on request.
y Contact your local Intel sales office or your distributor to obtain
obtain the latest specifications and before placing your product order.
order.
y This document contains information on products in the design phase phase of development. Do not finalize a design with this
product is available. Verify with your local sales office that you have
information. Revised information will be published when the product have
the latest datasheet before finalizing a design.
y Conroe, Paxville, Merom,
Merom, Tulsa, Sossaman,
Sossaman, Kentsfield and other code names featured are used internally within Intel to identify
products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are
not authorized by Intel to use code names in advertising, promotion
promotion or marketing of any product or services and any such use of
Intel's internal code names is at the sole risk of the user.
y All dates specified are target dates, are provided for planning purposes only and are subject to change.
y All products, dates, and figures specified are preliminary based on current expectations, provided for planning purposes only, and and
are subject to change without notice.
y Intel and the Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and and
other countries.
y *Other names and brands are the property of their respective owners.owners.
y Copyright © 2006, Intel Corporation
2
Session Objectives
y Describe the physical implementations of Intel's multi-core
processors on the 2006 and 2007 Intel platforms
y Explain the technical and performance differences between
Multi-Core processors and Hyper-Threading Technology
y A discussion of bus traffic behaviour between processors with
Hyper-Threading technology and Multi-Core processors
y Provide insight into the differences between shared and
independent cache architectures
3
Agenda
4
Multi-Core Physical Characteristics
y Two independent execution cores in one processor
y Monolithic and Multi-Chip configurations
– Implementations will vary over time
– Driven by design optimizations and market requirements
– May share L2 cache in monolithic designs
Multi-Chip Monolithic
Ex: 65 nm Pentium® D processor Ex: 65 nm Conroe
(900 Sequence)
Merom
2MB L2 2MB L2
Cache Cache 4 MB L2
2 MB L2 Cache
Cache
Conroe
Core Core
2 MB L2 Cache
Bus Interface
Sossaman
8
Intel® Itanium® 2 Processor
All products and dates are preliminary and subject to change without notice. Not representative of relative die sizes
Intel Desktop Chipsets
Intel® E7520
Chipset
Intel® 5000
Chipset
10
MP Platform Bus Topology
Intel E8500
and E8501
Chipsets
11
Agenda
12
Hyper-Threading and Multi-Core Definitions
13
Pentium® 4 Processor Block Diagram
System Bus
Decoder
Trace Cache
Rename/Alloc
uop Queues
Schedulers
FP RF Integer RF
FP move
ALU
ALU
ALU
Store
FP store
ALU
Load
AGU
AGU
FAdd
MMX
FMul
SSE
14
Pentium® D Processor Block Diagram
90nm
System Bus
Decoder Decoder
BTB
uCode
uCode
ROM
ROM
Trace Cache Trace Cache
Rename/Alloc Rename/Alloc
Schedulers Schedulers
FP RF Integer RF FP RF Integer RF
FP move
FP move
ALU
ALU
ALU
ALU
ALU
FP store
Store
ALU
ALU
FP store
Store
ALU
Load
Load
AGU
AGU
AGU
AGU
FAdd
FAdd
MMX
FMul
MMX
FMul
SSE
SSE
15
65nm Pentium® D and DP Xeon®
Processor Block Diagrams (Presler and Dempsey)
System Bus
Decoder Decoder
BTB
uCode
ROM
ROM
Trace Cache Trace Cache
Rename/Alloc Rename/Alloc
Schedulers Schedulers
FP RF Integer RF FP RF Integer RF
FP move
ALU
ALU
ALU
FP store
Store
ALU
FP move
Load
ALU
ALU
ALU
Store
FP store
ALU
AGU
AGU
Load
AGU
AGU
FAdd
MMX
FMul
FAdd
MMX
FMul
SSE
SSE
16
Xeon® MP Processors
System Bus
Bus Interface
BTB & I-TLB BTB & I-TLB
Decoder Decoder
BTB
uCode
uCode
ROM
ROM
Trace Cache Trace Cache
Rename/Alloc Rename/Alloc
Schedulers Schedulers
FP RF Integer RF FP RF Integer RF
FP move
FP move
ALU
ALU
ALU
ALU
ALU
FP store
Store
ALU
ALU
FP store
Store
ALU
Load
Load
AGU
AGU
AGU
AGU
FAdd
FAdd
MMX
FMul
MMX
FMul
SSE
SSE
17
Tulsa Block Diagram
System Bus
Bus Interface
16MB Shared L3 Cache
Decoder Decoder
BTB
uCode
uCode
ROM
ROM
Trace Cache Trace Cache
Rename/Alloc Rename/Alloc
FP move
FP store
FP store
Store
Store
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
Load
Load
AGU
AGU
AGU
AGU
FAdd
FAdd
MMX
FMul
MMX
FMul
SSE
SSE
18
Merom, Conroe and Woodcrest
Block Diagram
System Bus
uCode
uCode
ROM
ROM
Decode Decode
Rename/Alloc Rename/Alloc
Schedulers Schedulers
FPU ALU ALU Load Store Store Load ALU ALU FPU
19
Pentium 4 Processor Without HT
®
Integer Thread
L2L2Cache
Cache and Control
and Control
Integer
Floating
Point
20
Pentium 4 Processor Without HT
®
L2L2Cache
Cache and Control
and Control
Integer
Floating
Point
21
Pentium 4 Processor With HT
®
L2L2Cache
Cache and Control
and Control
Integer
Floating
Point
22
Pentium 4 Processor With HT
®
L2L2Cache
Cache and Control
and Control
Integer
Floating
Point
23
Pentium 4 Processor With HT
®
L2L2Cache
Cache and Control
and Control
Integer
Floating
Point
24
Pentium D Processor
®
L2L2Cache
Cache and Control
and Control
Integer
Floating
Point
L2L2Cache
Cache and Control
and Control
Integer
Floating
Point
25
®
Dual-Core Pentium Processor Extreme Edition
Supports HT
Multiple Integer and Floating Point Threads
L2L2Cache
Cache and Control
and Control
Integer
Floating
Point
L2L2Cache
Cache and Control
and Control
Integer
Floating
Point
26
Four Core Multi-core Processors
Two Floating Point and Two Integer Threads
27
Agenda
y Intel Multi-Core processors for 2006
y The difference between Hyper-Threading Technology and
Multi-Core processors
y Bus Traffic Analysis
y Independent and shared cache designs
28
Bus Traffic Data Format
Software loop and processor configuration
Time in Seconds
Average data transfers per second
29
Bus Traffic gzip Single Core
y Gzip is a small program that fits within the 2MB cache
y Bus utilization is generally low with bursts of high use
Open Pg
Open Pg Open Pg Open Pg
Open Pg
Open Pg
Rank 1 Rank 3
Open Pg
Open Pg
Chipset
System Bus Memory Bus
Open Pg
33
Memory Influence on Data Speed
y Memory configuration influences data transfer rate
y Same amount of memory, but different Rank configurations
y 3.2GHz Pentium® Extreme Edition dual core processor
34
Memory Influence on Data Speed
y Faster memory and more ranks improves performance
35
Memory Influence on Data Speed
y Memory cannot fully utilize 6.4 GB/sec bus bandwidth
y 4 Rank DDR2-667 Memory at 800MHz
36
Intel® Smart Cache
y Enables access to full cache size when only one core is active
y Dynamically allocates cache space between cores
y Minimizes bus traffic by allowing both cores to access single copy of
data
Core 1 Core 2
2 MB L2 Cache
37
All products and dates are preliminary and subject to change without notice.
Independent Vs. Shared Cache
Designs
y Independent caches transfer data via the Bus
y Shared L2 cache enables single copy of data to be used by each
execution core
y Shared cache designs will directly transfer data between L2 and L1
caches
Independent Caches Shared Caches
L1 L1 L1 L1
L2 L2 L2 Cache Control
L2
MCH MCH
Mem 38 Mem
4 MB Shared vs 2 x 2 MB Independent Cache
y Conroe bandwidth higher but for only about ½ as long
y Work completed in less time with plenty of bus headroom
Spec “vpr” subroutine Intel 975 Chipset
Presler with both cores active
39
4 MB Shared vs 2 x 2 MB Independent Cache
y 4 MB shared cache on Conroe dramatically reduces bus utilization
y Memory technology is the limiter
40
Influence of Timing on Performance
y Scheduling of threads on a multithreaded processor can
influence performance
y Performance of four identical threads can be improved by
offsetting the start times
Thread 1
Thread 2
Thread 3
Thread 4
Time
41
Influence of Timing on Performance
43
Summary
y Multi-core processors provide multiple execution cores in a
single processor package
y Larger caches and shared caches improve performance by
reducing latency to frequently used data
y Choose memory implementation to maximize data transfers
y Today’s bus architecture is a high speed interface with plenty
of bandwidth for multi-core processors
44
Please fill out the
Session Evaluation Form.
Thank You!
45