Escolar Documentos
Profissional Documentos
Cultura Documentos
Agenda
Official Agenda Review the most commonly used tools in CMP arch research
Simulators Benchmarks
Outline
Choosing a Simulators Simics And friends
GEMS, Garnet & Orion , FeS2, SimFlex
Outline
Choosing a Simulators Simics And friends
GEMS, Garnet & Orion , FeS2, SimFlex
Choosing A Simulator
Performance Ease Of Use Performance Design Space Detail Flexibility
CMP Research
System-wide architecture
Asymmetric CMP
Memory hierarchy
Caches Coherence protocols
Interconnect
Network-On-Chip
6
Outline
Choosing a Simulators Simics And friends
GEMS, Garnet & Orion , FeS2, SimFlex
PIN
Not a simulator
M5 Simics ?
Why Simics?
Because everyone is using it
THE most widely used simulator in our field 1/3 of ISCA07 papers used Simics
10
Simics in a Nutshell
Virtual Hardware
Event driven Cycle accurate*
Java VM
Complete production software
Operating system
Drivers Hardware abstraction layer Boot firmware
HW/SW interface
Simulated (virtual) hardware
Network
Virtual Hardware
http://www.virtutech.com/ 11
Target Software
Scalable:
Single processor (uniprocessor /CMP) Distributed systems MPs Racks Clusters
http://www.virtutech.com/
13
Flexible
Different degrees of simulation (details)
Functionality only Microarchitecture and timing
Configurable
Hook/unhook modules Control their timing Write your own (in C++)
http://www.virtutech.com/
14
Demo
RedHat 7.2/ Itanuim Solaris/ PowerPC
NT/x86 XP/x86-64
Middleware and libraries Target operating system (s) Virtual target hardware
http://www.virtutech.com/
16
Simics Provides:
Checkpoints
Save/restore state
Breakpoints
Temporal breakpoints Break on memory/Register/IO Graphics breakpoint
Magic instructions
Signal Simics from within your application
10X-100X slowdown
in-order mode
User defines a timing model function which will be called when memory request occurs Function returns the number of cycles to stall 1000X-10000X slowdown
19
20
MAI supports:
Out-of-order execution, multi-processor, multi-threading, branch prediction, value prediction
CPU1 CPU1
CPU0
CPU2
CPU7
CPU6
CPU4
22
23
24
25
Simics in Research
Virtual Hierarchies, M. R. Marty and M. D. Hill, Micro's Top Picks 2008 Improving Multiple-CMP Systems Using Token Coherence,, M. R. Marty, J. D. Bingham, M. D. Hill, A. J. Hu, M.K. Martin and D. A. Wood, HPCA 2005 "Nahalal: Cache Organization for Chip Multiprocessors", Z. Guz, I. Keidar, A. Kolodny, U. C. Weiser, IEEE Computer Architecture Letters, May 2007 Memory Mapped ECC: Low-Cost Error Protection for Last Level Caches, D. H. Yoon and M. Erez, ISCA 2009 TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory, J. Bobba, N. Goyal, M. D. Hill, M. M. Swift, and D. A. Wood, ISCA 2008 Predicting the Performance of Recongurable Optical Interconnects in Distributed Shared-Memory Systems, W. Heirman, J. Dambre, I. Artundo, C. Debaes, H. Thienpont, D. Stroobandt, J. Van Campenhout, Photonic Network Communications 08 Serializing Instructions in System-Intensive Workloads: Amdahl's Law Strikes Again P. M. Wells, G. S. Sohi, HPCA 2008 Predictor Virtualization, I. Burcea, S. Somogyi, A. Moshovos and B. Falsafi, ASPLOS 2008
26
Outline
Choosing a Simulators Simics And friends
GEMS, Garnet & Orion , FeS2, SimFlex
27
28
Multifacet GEMS
GEMS is a set of modules for Virtutech Simics that enables detailed simulation of multiprocessor systems, including CMP.
Flexible
Can be configured/altered/hacked Add your own models
http://www.cs.wisc.edu/gems/ 29
GEMS Ruby
Cache hierarchy
L1, L2 (private/shared), SNUCA/DNUCA, Simple DRAM
HW transaction memory
Log-TM Suns Rock
Interconnect
Simple Garnet - detailed NoC interconnect
http://www.cs.wisc.edu/gems/ 30
Outline
Choosing a Simulators Simics And friends
GEMS, Garnet & Orion , FeS2, SimFlex
31
CPU
L1$
CPU
L1$
L2$
L2$
L2$
L2$ CPU
L1$
L2$
L2$
L2$
L2$
CPU
L1$
L2$
L2$
L2$
L2$
CPU
L2$
L1$
L2$
L2$
L1$
L2$
L1$
CPU
CPU
33
34
35
BusBus-Enhanced Network on-Chip onA new interconnect architecture, utilizing the best of both worlds
Use NoC for data delivery Use bus for lightweight, latency critical meta-data
Coherency
R
Module
R
Module
R
Module
R
Module Module
R
Module
R
Module
R
Module Module
R
Module
R
Module
R
Module Module
R
Module
R
Module
37
R
Module Module
R
Module
R
Module
R
Module Module
R
Module
R
Module
R
Module Module
R
Module
R
Module
R
Module Module
R
Module
R
Module
38
Advantages
Fast Simple
Disadvantage
Dependencies are lost Does not account for latency hiding techniques (e.g. OOO)
But..
OPNET can be glued to Simics using Ruby
39
Outline
Choosing a Simulators Simics And friends
GEMS, Garnet & Orion , FeS2, SimFlex
40
?
41
Benchmark Comparison
CPU 2006
Programs Multi-Threaded Diverse Updated Emerging apps Installation ease Simulation friendly
29
OMP 2001
11
42
Over 1000 downloads since release This is what everyone will be using
http://parsec.cs.princeton.edu/ 43
Outline
Choosing a Simulators Simics And friends
GEMS, Garnet & Orion , FeS2, SimFlex
44
Technion Goodies
http://www.ee.technion.ac.il/matrics/software.html Simics workload kits
Ease up installation of simics workloads
Wisconsin GEMS provide few other too
Summary
A swift overview of simulation tools for CMP
Simics GEMS OPNET Benchmarks