Escolar Documentos
Profissional Documentos
Cultura Documentos
0.20
ILOM
Solaris
11U1
Solaris
S10U11
System
Domain
Kernal
FMA
Components
Platform
Drivers
Kernal
FMA
Components
Platform
Drivers
Sun4v
Sun4v
Sun4v
OBP
OBP
OBP
Host Config
Machine
Description
Hypervision
OBP
POST
Kernal
FMA
Components
Platform
Drivers
ILOM
Guest Mgr
FMA Support
Power On/Off
FERG
Hypervisor
POST
Host
Data
Flash
Memory
Environmentals
Fault Management
LED Control
SP Diags
DFRUIDS
Plat HW Svc
FDD (diagnosis)
IPMI
CLIs
Logs
SNMP
UBoot/Diags
IO
Host
5Copyright 2012, Oracle and/or its affiliates. All rights reserved.
Linux Kernel
Host Config
CPU
Host Flash
FPGA
Platform HW
PILOT3 Microprocessor
Service Processor
Logical Domain
Solaris
PM
Solaris
LDM
Logical Domain
Logical Domain
Solaris
Solaris
Hypervisor
HC
Physical Hardware
GM
DDB
Service Processor
fdd
POD
CAPI
UIs
faultDB
fmadm
Legend
CAPI ILOM Common API
COD Capacity on Demand
DDB Deconfig DB
faultDB Fault DB
fdd Fault Diagnosis Daemon
GM Guest Manager
HC Hostconfig
LDM Logical Domain Manager
PM Power Manager
POD Platform Obfuscation Daemon
UI User Interface
T5 SP
ILOM authentication
New features:
Virtual keyboard
Mouse and keyboard will only show up in the OBP device tree if remote
console is active. Before, devices were always present.
Video cannot be used as a system boot console. OBP rconsole alias
removed.
Side-band Management
3 Remote Management Communication Channels
Out-of-band management = communicate with the SP over a dedicated
media (Ethernet/Serial)
In-band management = communicate with the SP through Oracle Solaris
via agents
Side-band management = communicate with the SP over a shared media
Out of
Band
Serial
CLI
Tools
Side
Band
Side
Band
SNMP
Agent
In Band
Hardware
Management Pack
CLI
Host Console Redirection
Ethernet
HTTPS
RKVMS
SSH (CLI)
SNMP (traps)
IPMI
(syslog, SMTP, misc. IP Services)
Serial over Ethernet
Serial
CLI
Host Console Redirection
In Band
Out of
Band
Side Band
Out of
Band
Ethernet
HTTPS
RKVMS
SSH
(CLI)
SNMP (traps)
IPMI
(syslog, SMTP, misc. IP
Services)
Serial over Ethernet
SNMP
Agent
CLI
Tools
ILOM
Sideband
Host OS
Firmware Updates
Remote Host Management
Inventory and Component Management
System Monitoring and Alert/Fault Management
User Account Management
Power Consumption Management
(LDCs)
Communication bridge between Host and ILOM
Provides FERG capabilities
Manages LDOM configurations
Provides development facilities like Configvars, eFuse etc.
Can sequence the CPU in Serial boot mode for debug purposes.
Hostconfig
Initialization code that runs at power-on on SPARC
Platform specific code that drives initialization and configuration of
Hostconfig
Highly parallelized uses multiple strands to speed configuration
Memory configured in parallel
Deconfigures components in the deconfig db (DDB)
Logs
Console output (including Hostconfig) is captured in the console logs
hostconsole.log
Logs
@(#)Hostconfig 1.3.x-nightly 2012/10/24 19:11 [t5-8:debug]
2012-10-25 18:44:31 2:0:0> WARNING: TPM hardware is disabled
2012-10-25 18:44:56 3:0:0> NOTICE:
Fault Management
Knowledge Articles in MOS
ILOM fdd Diagnosis
Faults and Alerts
No ALOM Compatibility
ILOM FMA Captive Shell
Sideband Service Processor Network Connection
New ILOM Fault Notification (SNMP Trap)
ASR Support
FMA on M5 ILOM also applies to T5 ILOM, except for M5 specific features
FMA's Fault Proxy is used to keep ILOM's fault manager in sync with Solaris' fault manager.
Both will display the sum of all faults in the system.
Faults can be repaired from either side.
Fault Proxy communicates via the Ethernet Over USB connection.
IO faults are still diagnosed by Solaris.
For faults which diagnose resources as unusable, ILOM will add those resources to the DDB.
Resources excluded on next host reset.
When faults are repaired, ILOM automatically updates the DDB. Bringing components back
online requires a host reset.
Runs at SP boot. Tests devices on the SP FRU and its Ethernet port.
Status stored and converted to ereports after ILOM boots.
Fault proxy
SP
ereports
hostd
hostd
FETD
FETD
ip-transprt
ip-transprt
LDC
LDC
Control Domain
ETM
ETM
ETM
ETM
faults
TCP/IP
TCP/IP
ereports
LDC
LDC
IO Domain
ETM
ETM
faults
ip-transport
ip-transport
ETM
ETM
LDC
LDC
ETM
ETM
IO ereports are forwarded from the SP to the control domain, and then on to any
relevant IO domain
Faults are proxied between the SP, the control domain and any IO domains to
The SP and the control domain can view and manage all faults in the system.
An IO domain can only view and manage faults local to the domain.
Ereport Generation
Three producers of ereports:
Guest Manager (GM)
Error Telemetry Collection Daemon (ETCD)
Platform Obfuscation Daemon (POD)
HV Reported Errors
Communicates error information in a raw, binary format called a
(FERG)
published to the Event Manager framework
generated
ASR Support
SPARC T5 servers will be supported by ASR (Automatic Service
Request) at release
Continues use of sunHwTrapFaultDiagnosed SNMP notification
Telemetry for ILOM fdd diagnosis
Supports platform and FRU identity
Supports multi-suspect list
Service Processors
- Minimal impact on user experience
Boot mode
T-series platforms (except T5-1B) have two boot mode options
Sequenced Boot: SP boots, then user initiates host power-on via ILOM
Parallel Boot: SP and host power on in parallel to reduce overall boot time
Boot Sequence
Service Processor Boot
Grub starts on poweron
Grub starts Linux
Linux starts various services, starts ILOM
ILOM starts Guest Manager
Services to the Guest OS domains
Communication (via FPGA) bridge between the host and the ILOM service
Provides Fault Error Report Generation (FERG)
Manages LDom configurations in persistent storage
POD performs power sequencing for Host processors
SP Highlights
ILOM looks/behaves just like ILOM on other platforms
Simple (user-visible) set of extensions to support Physical Domains
Extensions to support Service Processor Proxies and redundant
Processor Modules can be dynamically added to a running system Post-RR, that is...
New form of POST: iPost
Power throttling done in real-time by the FPGA based on power consumption, current draw
(IWARN), and temperature readings.
ILOM out of the loop, other than programming the power thresholds.
For PM power load balancing, ILOM set thresholds each second.
platforms
Open Problems unifies fault management with SDM
40Copyright 2012, Oracle and/or its affiliates. All rights reserved.
Why:
Old distro no longer supported
Security and bug fixes
Posix threads instead of Linux threads
41Copyright 2012, Oracle and/or its affiliates. All rights reserved.
Before: component_state property represented both the current state (disabled by POST or
hostconfig) and user-requested state.
After: split states to reduce confusion
current_config_state = actual state of the resource in the system
disable_reason = human readable reason of why it's disabled
requested_config_state = user requested state
As before, must start or reset the host for requested_config_state changes to take effect.
requested_config_state can only enable components disabled by requested_config_state.
All other disabled reasons are faults, which must be addressed (via fmadm acquit) or the
FRU replaced.
A fault in one component may cause other components to be disabled. These will be
noted with disable_reason of Configuration rules.
EP (Electronic Prognostics)
SDM Architecture
WEB
CLI
LUMAIN
SDM BACKEND
Platform xml
LIBHDL
Hw service
CAPI
SSM API
SDM CLI
CLI is reorganized.
/System target(tree) is introduced
Different components of the system are grouped and organized into
every level to indicate the over all health of that sub tree.
Open_Problems target shows the detailed descriptions of the faults in
the system
/SYS and /Storage targets are made legacy
Continue to exist but hidden by default
The legacy targets can be made visible by enabling
/SP/cli/legacy_targets property.
Targets:
Targets:
CPU_0
CPUs
CPU_1
Properties:
health = OK
health_details = architecture = x86 64-bit
summary_description = Two Intel Xeon Processor E5 Series
installed_cpus = 2
max_cpus = 2
SDM Web
Difference from ILOM 3.0:
Old components, sensors and indicators pages have been removed and
SDM Storage
New location for storage information as the previous Storage Viewer
all components
Contains non-RAID Controller and Expander information
Control. The PCI IDs provided allow for the part number and device
description to be known
Add-on components are based on PCIE slots on rackmounts and PEM,
Not Recognized
60Copyright 2012, Oracle and/or its affiliates. All rights reserved.
subsystem, component
Open Problems UI contains the most in depth information including
SPARC Virtualization
Technologies for The T5
App
Domain B
App
Domain B
DW DB
App
Domain C
Web
Web
Web
M-Series
69Copyright 2012, Oracle and/or its affiliates. All rights reserved.
T-Series, M5
Confidential Oracle Internal
App Web
Oracle Solaris
OLTP DB
App
DB
OLTP DB
Web
Domain A
Domain A
Oracle Solaris
Zones
Oracle VM Server
for SPARC
Dynamic Domains
Virtualization on T5 Systems
High degree of virtualization
Solaris 11
Zone
Solaris 11
Zone
Solaris 10
Zone
Solaris
Legacy Zone
Oracle Solaris 11
Solaris
Legacy Zone
Oracle Solaris 10
Solaris 10
Zone
Mission-critical deployments
- Largest Sun financial and Telco customers
all run Oracle Solaris Zones
- In production on 25+% of installed Oracle Solaris systems
Ideal for a variety of scenarios
-
Built-in Virtualization
Oracle Solaris 11 Zones
Secure, light-weight virtualization
Scales to 100s of zones/ node
Delegated administration
ZFS datasets, boot environments
Observability via zonestat
Solaris 10 Zones
NFS Server
Network stack isolation and
resource management
Co-engineered with installation, security, ZFS, networking, IPS, SPARC and x86 hypervisors
Cloud-Scale Networking
Secure Isolation
Ease of Use
Availability
Integrated functionality
Isolated OS and
applications in each
logical (or virtual)
domain
Oracle Solaris 10
Oracle Solaris 10
Oracle Solaris 10
Oracle Solaris 10
GP Domain
GP Domain
GP Domain
GP Domain
Oracle Solaris 11
Oracle Solaris 11
Oracle Solaris 11
Oracle Solaris 11
Database Domain
Database Domain
Database Domain
Database Domain
Firmware-based hypervisor
SPARC Hypervisor
T5 Server
Hypervisor Support
Hypervisor software/firmware responsible for maintaining separation
messages
Service domains use these channels and owns I/O resources for
bridged access
77Copyright 2012, Oracle and/or its affiliates. All rights reserved.
Roles of Domains
Control domain
Creates and manages other logical domains and services
Control domain usually also a service and I/O domain
I/O domains
own physical I/O bus or devices. May run apps using physical I/O for native
performance
Service domains
provide virtual network and disk devices. Typically an I/O domain
Guest domain:
run applications on virtual I/O devices provided by service domain
78Copyright 2012, Oracle and/or its affiliates. All rights reserved.
Domain Components
Control & Service Domain
Guest
vds
vswitch
vdisk
LDC
PCIe
vnet
Hypervisor
Hypervisor Basics
What is hypervisor?
Primary roles:
Implements software component of sun4v virtual machine, providing low
Hypervisor Basics
What hypervisor isn't:
An Operating System
HV does not time slice between guests running on strands and has a a fixed memory
footprint (no malloc)
HV only executes in response to a specific subset of traps. Except where hardware access
is involved, traps go directly to the guest for maximum performance. There are separate HV
and guest traptables for this reason.
A policy maker
HV enforces boundaries, but does not define them
HV will do as requested, even if it may harm the guest, as long as it does not violate
resource access restrictions
The IO manager
Drivers in the guest manage the PCIE fabric and devices
HV enforces access restrictions to IO resources
grouping of resources
Each Guest runs its own instance of Solaris
Each Guest can be created, destroyed, reconfigured, and rebooted
independently
The hypervisor enforces the partitioning of the server's resources, and
the OS and applications running in those partitions (i.e. Guest )
The hypervisor allocates a subset of the overall CPU, memory, and I/O
resources of a server to a given logical domain
Up to 128 guests per hypervisor
82Copyright 2012, Oracle and/or its affiliates. All rights reserved.
TYPE
TYPE
------BUS
BUS
BUS
BUS
BUS
BUS
BUS
BUS
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PFPCIE
PFPF
PFPF
PFPF
PF
BUS
BUS
----pci_1
pci_1
pci_0
pci_0
pci_3
pci_3
pci_2
pci_2
pci_1
pci_1
pci_1
pci_1
pci_1
pci_1
pci_0
pci_0
pci_0
pci_0
pci_0
pci_0
pci_3
pci_3
pci_3
pci_3
pci_3
pci_3
pci_2
pci_2
pci_2
pci_2
pci_2
pci_2
pci_0
pci_0
pci_0
pci_0
pci_3
pci_3
pci_3
pci_3
DOMAIN
DOMAIN
----------primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
STATUS
STATUS
-----------
EMP
EMP
EMP
EMP
OCC
OCC
EMP
EMP
EMP
EMP
OCC
OCC
EMP
EMP
EMP
EMP
OCC
OCC
EMP
EMP
EMP
EMP
OCC
OCC
Guest Domain
Guest Domain
App
App
App
kernel
App
App
kernel
Solaris
I/O stack
PF
PF
Device
Device driver
driver
Root Domain1
kernel
kernel
Multi
Multi
Pathing
Pathing
VF
VF
Solaris
Solaris
I/O
I/O stack
stack
Multi
Multi
Pathing
Pathing
VF
VF
VF
VF
App
App
PF
PF
Device
Device driver
driver
VF
VF
Hypervisor
pci@500
pci@500
pci@500
PCIe
PCIe
switch
switch
Virtual
PCIe
switch
pci@400
Virtual
PCIe
switch
pci@500
Virtual
PCIe
switch
pci@400
pci@400
pci@400
Virtual
PCIe
switch
PCIe
PCIe
switch
switch
PF
PF
VFs
VFs
DRM policy.
- Ensures that domains running the most important workloads get priority for
APP
APP
APP
APP
APP
Guest OS2
Virtual NIC
Virtual NIC
Virtual NIC
VMM
Physical NIC
APP
Guest OS1
Intel VT-d
Drawbacks
APP
Guest OS0
Intel VT-x
Provides scalability
APP
APP
SR-IOV
IOV for PCI Express (PCIe) HW
An IOV solution that allows direct access to
APP
APP
VM Device
Config Space
Guest OS0
Fn0
DMA
Usage model
Individual NIC port belong to different OSes
Multiple Guests share SR-IOV devices
System Device
Config Space
PFn0
Virtual NIC
VFn0
VMM
Intel VT-x
APP
Intel VT-d
Physical NIC
VFn1
VFn2
IOV Benefits
Performance
- Fully utilize IO device resources such as 10G NIC bandwidth
- Low latency
Cost reduction
- Capital and Operational Expenditure savings from Power savings, reduced
I/O Domain 1
I/O Domain 2
I/O Domain 3
Primary
Operating
Systems
Operating
Systems
Operating
Systems
Operating
Systems
Operating
Systems
Hypervisor
VF0
pci_0
pci_0
pci_0
pci_0
pci_0
PCIe Switch
(virtualized)
PCIe Switch
(virtualized)
PCIe Switch
(virtualized)
PCIe Switch
(virtualized)
PCIe
Switch
VF1
VF2
VF3
VFs
SR-IOV Card
systems
SPARC M5
SPARC T5
SPARC T4
SPARC T3
UltraSPARC T2 Plus
UltraSPARC T2
VM
network
VM
VM
Allows migration among same CPU architecture with different system clocks
frequencies
Dependent on guest domain having Solaris 11
Solaris introduces a generic sun4v CPU module, simulated 1GHz system
clock if HW not available, other changes.
LDoms Manager introduces domain cpu-arch property
95
generic mode; emulate boot frequency after migration in native mode to system with
different clock frequency
96Copyright 2012, Oracle and/or its affiliates. All rights reserved.
96
domains
Guest domain must be Solaris 11 FCS or newer
Migration is for the most part unchanged
At the start of the migration, domain capabilities and generic CPU
97
Solaris Support
For RR, will qualify and support S10U9 and S10U10 + patch bundle in
a guest domain
Power Management
T4
T5
M5
No
Cycle Skipping
No
No
T5-2 only
Gold+
A261A
A254
Power Supplies
Comments
Notes:
Other T5 servers have do not have sufficient links to turn off
Cannot be used in Performance Mode
Solaris
11.1 poweradm
10 & 11: pwconfig &
/etc/power.conf
ILOM
ssh CLI
HTTP
PM policy
mempm
cstates
pstates
DVFS
cycle skip
DVFS
cycle skip
Coherency Link
Scaling
HV
PM
Coh-link scaling,
Coordinate PAD
PPFE
CPU
MCU
BoB Link
Initialize
BoB
Initialize
channel
DIMM
PM
policy
System domain
policy
PM
Policy
PM
capper
System & HW
domain cap
PRI
SP
Power
Capper
Platform MD
Initialize
Coherency
Link
Initialize
hostconfig
FPGA
enabled
Elastic: unused or idle components power managed
CPUs, cores, memory, (coherency links - T5-2 only)
Prior versions of ILOM < 3.2 had 2 Policies
Performance (equivalent of new disabled policy), and Elastic
Configure pending power limit in watts (replacing 400 with a value that
>set/SP/powermgmt/budgetpendingpowerlimit=400
consumption
Hard Cap:
T5: stays within blade
command
Impacts that LDom/Guest exclusively
Impacts shared resources such as memory
LDoms
Per guest CPU power consumption based on CPU utilization
Per guest memory power consumption based on memory allocation
Solaris
Powertop pstate and cstate residency
ratios
Logical Domains
Observability
Monitoring
Current consumption
History
Graphing
Average by group of servers
Servers
utilization with:
Chip wide DVFS
Per core pair cycle skipping
SerDes power scaling
DIMM off-lining w/ Dynamic Reconfiguration
DRAM PPSE and PPFE support
PCI Express Power Management
Clock Gating
100% utilization
f(x)=x2.82
M5/
T5
M5/
T5
M5/
T5
4 links
M5/
T5
M5/
T5
M5/
T5
4 links
1 link
M5/
T5
M5/
T5
2 links
M5/
T5
3 links
M5/
T5
M5/
T5
M5/
T5
2 links
25W
Savings
T5M
IWARN
VID
throttle /
resume
VID
VRM
Temp Sensor
PLL
T5 CPU
VDD
Freq, V
If all T < low-water mark Raise
Freq, V
high-water mark
Raise F,V if any current <
low-water mark
Controls currents for CPU
VID
T5M
IWARN
VID
VRM
throttle /
resume
PMC
PLL
T5 CPU
DVFS Functionality
M5/T5 CPU supports operation at
Current High
12V Current
Sensor
(Low Water)
Current Low
Water Exceeded
Throttle
M5/T5 Chip
Resume
Thermal/Power
Contro FPGA
Thermal Diodes
Temp.
Thermal
Sensors
DVFS
Dynamic Voltage Frequency Scaling
Not all chips are running at the same voltage/frequency
Up to 32 P-states defined in efuse per chip
DVFS
T5/M5 supports HW enabled mode (vs disabled)
SW programs T5 P-state tables from Efuse and sets limits
FPGA pulses Throttle pin in response to thermal sensor alarm
FPGA pulses Resume pin in response to thermal sensor note
200MHz, 6.25mV)
Throttle instructs Power Mgmt Controller (PMC) to increase P-state by
1 to reduce power use
Resume instructs PMC to decrease P-state to lowest allowed
T5 transitions up/down P-state table under FPGA control
128Copyright 2012, Oracle and/or its affiliates. All rights reserved.
Efuse
Efuse per chip data set in fab
Used to control what components may be enabled in a processor node
Open Boot
OpenBoot - Introduction
OpenBootTM is Oracles trademark for Boot Firmware based on the open
the guest
Initialize IO devices and option cards
Builds HW configuration in a device tree format for OS clients
Boot OS from disk or network
Provide boot time services to OS
131Copyright 2012, Oracle and/or its affiliates. All rights reserved.
to the OS
OpenBoot binary name openboot.bin
OpenBoot Component version 4.35.x
OpenBoot Binary is common across all M5/T5 platforms, released as
User Apps
SunVTS
Solaris 11
Solaris 10
- Kernel/Drivers
- FMA Agent
- Kernel/Drivers
OpenBoot
OpenBoot
Guest
Domains
- FMA Agent
Hypervisor
Guest
Manager
Sun4v
API
POST
ILOM / Linux
HostConfig
SPARC T5 CPU
Memory
IO
Host Hardware
FPGA
SP CPU
Host-Config
Hypervisor
OpenBoot
POST
Linux
ILOM
Guest Manager
SP Hardware
Platform Management
Platform Management
SNMP
Oracle Enterprise Manager Ops Center
Sun Cluster 3.2 and Sun Cluster 4.0 are supported
SNMP Requirements
The SNMP Agent, based on open source Net-SNMP, will run on the SP and
from MIBs, including the Platform MIB and Sun Fault Management MIB, to all
interested third party managers
The Agent will run SNMPv1, SNMPv2, and SNMPv3. SNMPv3 can be utilized
SNMP Monitoring
Comprehensive SNMP Support (V1, V2c, V3)
Out of Band via ILOM
Standard MIBs
Oracle MIBs
RFC1213-MIB
SNMP-FRAMEWORK-MIB
SNMP-MPD-MIB
SUN-SUN-HW-CTRL-MIB
SUN-ILOM-CONTROL-MIB
SNMP-Control-MIB
SUN-PLATFORM-MIB
ENTITY-MIB
SNMP-USER-BASED-SMMIB
SUN-HW-TRAP-MIB
SUN-HW-MONITORING-MIB
Oracle MIBs
SUN-HW-TRAP-MIB
Integrated Infrastructure
Management
+
Integrated Application-toDisk Management
+
Integrated Lifecycle
Management
+
Integrated Systems
Management & Support
142Copyright 2012, Oracle and/or its affiliates. All rights reserved.
Operating Systems
Solaris
Cluster
Exadata
Exalogic
Virtualization
Engineered
Systems
for SPARC
Containers
Dynamic Domains,
OVM for SPARC
Storage Systems
Key Features
DISCOVER
Inventory
Bare-metal discovery
VM auto discovery
Advanced permission model
Team sharing
Firmware
Solaris and Linux
Golden images
LDom hypervisor
Provision OS in Zones/LDoms
UPDATE
Automation
PROVISION
Hardware and OS
Resource optimization
Reporting
Audit log
Historical monitoring
MONITOR/MANAGE
Dynamic
Domains
Zones
HYPERVISOR
M5-32
M-Series
Oracle VM
for SPARC
T-Series
T5-8 - Summary
T5-8 - Hardware
SPARC Roadmap
In the Lab
M-Series
In Test
M-Series
Delivered
+6x Throughput
+1.5x Thread Strength
+2x Throughput
>1x Thread Strength
+2x Throughput
+1.5x Thread Strength
T-Series
T4
+2.5x Throughput
+1.2x Thread Strength
+1x Throughput
+5x Thread Strength
Solaris 11
Solaris 10 U10
Database Query
Compression
Encryption
Cluster Interconnect
Software Quality
T-Series
In Test
2011
Oracle Application
Accelerators
2012
2013
Solaris 11 Update
Solaris 10 Update
Solaris 11 Update
Solaris 10
2014
Solaris 11 Update
Solaris 10
2015
Solaris 11 Update
Solaris 10
2016
Increased Performance
Higher core frequency
Multiple pipelines per core
Increased core counts per
chip
Larger caches
More memory bandwidth
Summary
In simple terms, moving a binary from T4 to T5 gives at least double or more
performance.
2.7x Memory bandwidth, 2x I/O bandwidth of T4.
2.4x Throughput over T4, for 128 threads.
3.6 Ghz core, Inherit all the advancement of T4