Escolar Documentos
Profissional Documentos
Cultura Documentos
0.1 W
1W
10 W
10 mW
1 mW
... ...
xxxxxxxxxxxx5
Outline
Background: Runtime PM of mobile SoC
Problems with driver-directed runtime PM
Centralized power management
Future work
7
Public Version
www.ti.com
Description
L4_PER
32 bits
L4_CFG
32 bits
L4_CFG
32 bits
L4_PER
32 bits
L4_CFG
32 bits
ISS megacell
MPU
Master 1
64 bits
DSP subsystem
SL2
SL2
32 bits
32 bits
Shared L2 IF +
SL2: 256KB
64 bits
L4_CFG
128 bits
VID1,2,3
GFX,
Write back
pipelines
32 bits
ISS interconnect
(128 bits)
RTOS
Cortex-M3
CBUFF
128 bits
32 bits
SIMCOP
Ctrl
Cortex-M3
SGX540
subsystem
HS USB OTG
2D, 3D
graphics
and
video codecs
System DMA
32 channels
127 requests
ULPI
wrapper
BTE
Embedded
DMA
128 bits
32 bits
ISP5
FD
core
LCD
&
TV
overlays
config
128 bits
SIMCOP
HS
USB
PHY
Cortex-M3 subsystem
2x
HS-MMC
USB 2.0
controller
Emulation features
32 bits
64 bits
64 bits
64 bits
WR port
- 3x MCBSP
- 1x SLIMbus
- 1x MCPDM
- 1x MCASP
- 1x DMIC
- 4x GPTIMER
- 1x WDTIMER
- 1x AESS
Face
Detect
(FDIF)
DMA
RFBI
NTSC/PAL video enc
HDMI video enc
MIPI DSI Ctrl / Mem
RD port
+ Neon + VFPv3
ISS
ISS interconnect
(32 bits)
+ Neon + VFPv3
DSS interconnect
(32 bits)
Cortex-A9
CPU1
32 bits
Cortex-A9
CPU0
Audio engine
RAM: 88KB
L4-ABE interconnect
(32 bits)
MPU
Master 0
IVAHD subsystem
1080p
- Sys Ctrl, Acc. engines,
- Filters, Msg IF (16 bits)
- Seq: ARM968 w/mem
- w/ int ctrl, Mailbox
ABE subsystem
IVAHD interconnect
(32 bits)
MPU subsystem
Display subsystem
128 bits
64 bits
32 bits
32 bits
9
CORE L3 instrumentation
NAND/NOR/
PSRAM
controller
(56KB SRAM)
HS-I C
HDQ/1-Wire
MCBSP
MCSPI
UART (1x IRDA)
GPTIMER
GPIO
SLIMBUS
HS-MMC
Other modules:
- CSreplicator
- CSTF
- PDLO
IEEE
1149.7
adapter
XTRIG
4x
1x
1x
4x
4x
6x
5x
1x
3x
EMU Configuration
interconnect
ICEPick
MIPI_STM
To CSTF
128 x 48
mem
CSETB
64 bits
64 bits
32 bits
32 bits
L4_CFG
32 bits
L4_CFG
32 bits
To
MPU_ss
CSTPIU
8KB mem
DRM
To EMU L3
instrumentation
3
1
1
1
32 bits
L4_CFG
32 bits
To EMU L3
instrumentation
HSI
1-port
C2C
ELM
To
HS-MMC 1
HS-MMC 2
DSS
To
EMIF4D
DAP
TAPs
To HSI,
System DMA,
HS USB OTG,
HS USB PHY,
HS USB Host,
FS USB,
Shared OCP WP,
HS USB TLL
EMU L3
instrumentation
L4_CFG interconnect
OCM L3 RAM
29
From CM1
(profiler port)
From PRM
(profiler port)
To CORE L3
instrumentation
HST
2-port
HS USB
Host
FS USB
EHCI /
OHCI
HSR
L4_CFG
32 bits
2-port
HS
USB
TLL
HS
HS
ICIC
x2
To DSP subsystem
3
3x SmartReflex
PRM + profiler
- GPTIMER
- GPIO
- 32KTIMER
- SCRM
- WDTIMER
- General Wakeup Control module
- SAR RAM (8KB 32-bit data)
ICEmelter
KEYBOARD
Device Wakeup Control module
L4_WKUP interconnect
GPMC
L4_PER interconnect
LPDDR2
From CM2
From
IVA HD (profiler port)
Shared OCP WP
64 bits
EMIF4D
2
32 bits
32 bits
L4_CFG
From
C2C
EMIF4D
1
LPDDR2
4 x 32 bits
64 bits
32 bits
64 bits
Performance Monitoring
128 bits
L3 interconnect
32 bits
intro_swpu141-001
TI OMAP4
Hardware states
Clock-gated
Powered-off
10
power domain
clock domain
clock domain
clock domain
I2C
GPIO
power domain
module
power domain
clock domain
clock domain
clock domain
module
clock domain
clock domain
clock domain
clock domain
clock domain
clock domain
module
11
ON
clock domain
clock domain
clock domain
I2C
GPIO
power domain
OFF
clock domain
clock domain
clock domain
module
RETENTION
power domain
clock domain
clock domain
clock domain
module
power domain
clock domain
clock domain
clock domain
module
12
Runtime PM is a collaboration
between software & hardware
power domain
ON
clock domain
clock domain
clock domain
I2C
GPIO
13
ON
clock domain
clock domain
clock domain Software
Trigger
I2C
GPIO
14
ON
clock domain
clock domain
clock domain
I2C
Software
Trigger
GPIO
15
ON
clock domain
clock domain
Hardware
Trigger
clock domain
I2C
GPIO
16
ON
clock domain
Hardware
Trigger
clock domain
clock domain
I2C
GPIO
17
power domain
Hardware
Trigger
clock domain
clock domain
clock domain
I2C
GPIO
18
19
How(Linux(does(it(
Operating System
User
User(programs(set(QoS(requirements(
User programs
QoS requirement
QoS callback
Device Drivers
Example(QoS(requirement(
Wakeup(latency(
20
How(Linux(does(it(
Operating System
User
Drivers(fulll(QoS(requirements(
User programs
QoS requirement
Linux PM_QoS framework
Adjust PM parameters
according to QoS
requirements
QoS callback
Device Drivers
Examples(
Idle(>meout(
Target(low@power(state(
21
How(Linux(does(it(
User
Drivers(track(pending(tasks(
User programs
Operating System
QoS requirement
Linux PM_QoS framework
QoS callback
Device Drivers
Drivers role:
pm_runtime_get/put
Linux runtime_PM framework
SoC
Modules
enable/disable module
DSP
I2C
Display
controller
SPI EHCI
22
power domain
Hardware
trigger
clock domain
clock domain
clock domain
I2C
Driver triggers
enable/disable
GPIO
23
power domain
clock domain
clock domain
clock domain
I2C
Driver triggers
enable/disable
GPIO
Hardware
trigger
QoS
requirements
Pending
Tasks?
24
Outline
Background: Runtime PM of Mobile SoC
Problems with driver-directed runtime PM
Centralized runtime power management
Future work
25
26
Problem #1:
Drivers do no runtime PM
Board(
TI(OMAP(
Samsung(Exynos(
Freescale(i.MX(
Nvidia(Tegra(
Module(
UART(
WDT(
USB(
Keypad(
USB(
SD(
I2C(
SPI(
SD*(
Delay((month)(
12(
22(
45(
18(
14(
37(
9(
29(
44(
27
Problem#2:
Drivers can be very complex
OMAP display subsystem
565 pages in manual
22,000 lines of code
Tens of callbacks
Several asynchronous executions
Display(controller(is(kept(on(when(screen(is(on(
28
Problem#3:((
Hierarchical(PM(makes(it(worse(
Bad PM for a module => Bad PM for the domain
UART
29
Solutions
Get more and better driver developers
Help driver developers with a tool
Relieve drivers from doing runtime PM
30
Outline
Background: Runtime PM of Mobile SoC
Problems with driver-directed runtime PM
Centralized runtime power management
Future work
31
32
33
34
Hardware-assisted solution
36
IRQ handling
Configuration
37
IRQ handling
Configuration
I2C controller
terrupt
38
IRQ handling
Configuration
I2C controller
terrupt
e
Interrupt: messag
transferred
read & ack in
terru
pt
39
No memory exception !
No register access in the past period
!No pending tasks
Enabled module
: memory exception
Monitor
time
Driver
41
No memory exception !
No register access in the past period
!No pending tasks
Enabled module
remove read/
write permission
Tthreshold
remove read/
write permission
Tthreshold
: memory exception
remove read/
write permission
Monitor
time
Driver
42
No memory exception !
No register access in the past period
!No pending tasks
Enabled module
remove read/
write permission
...
Tthreshold
register
access
remove read/
write permission
Tthreshold
: memory exception
remove read/
write permission
Monitor
time
Driver
43
No memory exception !
No register access in the past period
!No pending tasks
Enabled module
remove read/
write permission
...
Tthreshold
register
access
remove read/
write permission
Tthreshold
: memory exception
remove read/
write permission
all tasks finished
Monitor
time
no register
access
Driver
44
: memory exception
Monitor
time
Driver
45
register access
new pending task!
Enable module
: memory exception
Monitor
time
Driver
46
Pending
task?
Controller
Enable/Disable
SoC Module
47
Evaluation
Pandaboard uses TI OMAP4460
Linaro Android 13.10 release
Kernel version 3.2
48
Evaluation setup
Tested module
MMC controller (used by file system on SD card)
I2C, SDIO controllers (used by Wi-Fi NIC)
DISPC (Display Controller, part of display subsystem)
49
100
Stock0Linux0Driver
Central0PM0Agent
(Tthreshold=100ms)
80
60
40
20
0
MMC
I2C
SDIO
DISPC
50
100
Stock0Linux0Driver
Central0PM0Agent
(Tthreshold=100ms)
80
60
40
20
0
MMC
I2C
SDIO
DISPC
51
Low overhead
Memory exception handling causes 2500 cycles
(2.5 s if CPU freq.= 1 GHz) latency.
Memory exception occurs for each module at
most once every Tthreshold=100ms
52
Central+PM
30
16
Wi*Fi,Throughput,(MB/s),
SD+Card+Throughput+(MB/s)+
18
14
12
10
8
6
4
2
0
Read
Write
(a)+SD+card+Throughput+
SD Card
Throughput
Stock,Linux
Central,PM
25
20
15
10
5
0
Send
Receive
(b),Wi*Fi,Throughput,
Wi-Fi
Throughput
53
Central PM Agent(recap)
Relieving driver developers from PM
Comparable with hand-tuned PM
No hardware modification
Central PM Agent
Monitor
Device Driver
PM callback
Pending
task?
Controller
Enable/Disable
SoC Module
55
Device Driver
PM callback
Pending
task?
Controller
Enable/Disable
SoC Module
56
Central PM Agent
Monitor
Device Driver
PM callback
Pending
task?
Controller
Enable/Disable
SoC Module
58
Central PM Agent
Monitor
Pending
task?
Device Driver
PM callback
Controller
Enable/Disable
SoC Module
busy/idle
register
59
Device Driver
PM callback
Controller
Enable/Disable
SoC Module
busy/idle
register
60
61
62
read busy/idle
1
reg.
Task
read busy/idle
1
reg.
read busy/idle
1
reg.
T: can be as small as
1ms
0
read busy/idle
reg.
HWassisted
Monitor
time
Module
Activity
time
Busy/Idle
register
value
63
64
http://www.bdti.com/InsideDSP/2011/03/30/Xilinx
I2C slave:
MMA8452Q
Accelerometer
65
Existing State
Machine in IP
Clk
Qf
Clk
Qf
Clk
Qf
Bit 0
IP Core
Bit n
66
Module
Xilinx I2C
Opencores SPI
Opencores I2C
Development Efforts
FPGA Resources
LoC
Time*
LUTs
Registers
93 (+1.2%) 12
16 (+3.8%) 8(+2.4%)
15 (+6%)
5
1 (+1.3%) 1(+1.5%)
20 (+2%) 10
4 (1.8%) 1(+0.6%)
ASIC Resources
Gates
N/A**
15(+1.1%)(
34(+1.6%)(
68
Central PM Agent
with Busy/Idle Register (recap)
Small(eorts(to(add(a(busy/idle(register(per(module(
Enabling(aggressive(PM;(Incurring(less(overhead(
Work(for(all(SoC(modules(
Central PM Agent
HW-assisted
Monitor
Monitor
Pending
task?
Device Driver
PM callback
Controller
Enable/Disable
SoC Module
busy/idle
register
69
Moving forward
Software-only solution
Distinguish read/write access to registers
Exploit interrupts
Hardware modification
busy/idle register+polling ! interrupt
70
Driver
Routines for
runtime PM
Device
Device
Interconnect
71
Linux runtime PM
Driver
Driver
Device
Device
Interconnect
72
Controller
Driver
Driver
Device
Device
Interconnect
73
Controller
Driver
Driver
Device
Device
Monitor
Interconnect
74
Driver
Device
Device
DSM
Interconnect
75
Summary
Driver-directed runtime PM harmful
A centralized architecture equally effective
Disengaging CPU possible with the
centralized architecture
All examples, source code & traces available at http://www.recg.org
Chao Xu, Xiaozhu Lin, Yuyang Wang, and Lin Zhong, "Automated OS-level device runtime power management," to appear in ACM
ASPLOS, March 2015.
76
Acknowledgement
http://recg.org
77