Você está na página 1de 29

Leveraging Flash Translation Layer for Application Acceleration

Ashish Batwara Fusion-io

Flash Memory Summit 2012 Santa Clara, CA

Traditional Storage Stack


User space

Application

Kernel space

Filesystem

Block Device Driver LBA

Hardware

LBA view enforced by Storage Protocols (SCSI/SATA etc.)

Device

Flash Memory Summit 2012 Santa Clara, CA

Flash is Different From Disk


Area
Logical to Physical Blocks Read/Write Performance Sequential vs Random Performance Background operations

Hard Disk Drives


Nearly 1:1 Mapping Largely symmetrical An order of magnitude difference Rarely impact foreground

Flash Devices
Remapped at every write Heavily asymmetrical. Minimal difference Regular occurrence. If unmanaged can impact foreground Limited writes 100Ks to Millions 10s - 100s us Improves performance

Wear out IOPS Latency TRIM


Flash Memory Summit 2012 Santa Clara, CA

Largely unlimited writes 100s to 1000s 10s ms Do not benefit

Flash Translation Layer 101


Input Logical Block Address (LBA) Flash Translation Layer Output Commands to NAND flash

Flash Memory Summit 2012 Santa Clara, CA

Flash in Traditional Storage Stack


User space

Application

Kernel space

Filesystem

Block Device Driver LBA

Hardware

Flash Translation Layer LBA PBA Device

Flash Memory Summit 2012 Santa Clara, CA

Virtual Storage Layer


Fusion-ios host based FTL
Virtual Storage Layer (VSL)
Host
DRAM / Memory / Operating System and Application Memory ioMemory Virtualization Tables

Cut-thru architecture avoids traditional storage protocols Scales with multi-core Provide a large virtual address space HW/SW functional boundary defined as optimal for flash Traditional block access methods for compatibility New access methods, functionality and primitives natively supported by flash
Flash Memory Summit 2012 Santa Clara, CA

CPU and cores

DATA TRANSFERS

Virtual Storage Layer tm (VSL )


Commands

PCIe

ioDrive

ioMemory

Data-Path Controller

Banks

Channels Wide

Fast Forward

Host-based FTLs integrate and scale with applications, examples include


File Systems Caching Databases

Power of FTL no longer restricted by traditional block interfaces Opportunity for performance, simplicity and reliability improvements
Flash Memory Summit 2012 Santa Clara, CA 7

Flash Memory Evolution


|
Traditional SSDs Flash as a drive Flash as a cache

Native Access
Flash with direct I/O semantics Flash with memory semantics

Application
Application

Application

Application

Application
Open Source Extensions

Application
Open Source Extensions

Application

OS Block I/O

OS Block I/O

OS Block I/O Direct-access I/O API Family Memory access API Family

File System
Host

File System
Host

File System Block Layer directCache

Block Layer SAS/SATA Network RAID Controller

Block Layer

directFS native file system service

directFS

Remote

VSL VSL
Read/Write Read/Write

VSL

VSL

VSL

Flash Layer
Read/Write

Read/Write

Read/Write

Load/Store

Flash Memory Summit 2012 Santa Clara, CA

Direct-access I/O API family


|
Legacy SSDs Flash as a drive Flash as a cache

Native Access
Flash with direct I/O semantics Flash with memory semantics

direct I/O Primitives


Application

Application Sparse
Application

Application Addressing

Application

Application
Open Source Extensions

Application
Open Source Extensions

Atomic multi-block Operations


Write
OS Block I/O PTRIM OS Block I/O OS Block I/O

Exists, Range Exists Conditional Write File System Read Range


Host

Direct-access I/O API Family

Memory Semantics API Family

File System
Host

File System Block Layer

Block Layer Network

SAS/SATA direct Key-Value Store

Block Layer

directFS native file system service

directFS

Remote

NVM optimized with transactional directCache RAID Controller VSL semantics


Flash Layer
Read/Write Read/Write

VSL

VSL

VSL

VSL
Read/Write Read/Write Read/Write Load/Store

Flash Memory Summit 2012 Santa Clara, CA

Sparse Addressing

Flash Memory Summit 2012 Santa Clara, CA

10

Excess work by conventional cache


Conventional Block Cache Application

Cache: HDD block -> Flash LBA

Cache Hit

Metadata, Persistence, Recovery logic etc.

Cache Miss

Flash FTL: LBA -> PBA Backend Store Block Device

Two Translations Additional metadata, logic etc.


Flash Memory Summit 2012 Santa Clara, CA 11

Sparse address mapping

Mapped to sparse address space

Mapped to sparse address space

Backend store

Flash Memory Summit 2012 Santa Clara, CA

12

VSL based cache


VSL Based Cache Application

VSL Based Cache: Minimal Fixed Metadata


Cache Hit Leverage on primitives

VSL Sparse HDD LBA -> PBA

Cache Miss

Backend Store Block Device

Fewer Translations Minimal additional metadata, logic etc.


Flash Memory Summit 2012 Santa Clara, CA 13

Direct I/O - Atomic Operations


Traditional Atomicity (with Hard Disks) Traditional Atomicity (with SSD) Atomicity in ioMemory

Atomicity DBMS Trans Log Applications DBMS

Atomicity Trans Log Applications

DBMS File System

Applications

Atomicity FileSystem Metadata Journaling, Copy-on-Write FileSystem

Atomicity Metadata Journaling, Copy-on-Write

Block IO Layer Generalized ioMemory Layer Re-mapping AtomicOperation Page Read/Write Wear-Leveling

Block IO Layer

Block IO Layer Sector Read/Write

Block Erase

Sector Read/Write Flash Translation Layer Re-mapping Wear-Leveling Block Erase Page Write Page Read NAND Flash Memory Solid State NAND Flash Memory Disk ioMemory Controller

Disk Drive

September 9, 2012

14

Transactional Block Interface


Application issues call to transactional block interface
Write all blocks atomically

iov[0] Range[0] Range[1]

iov[3] Range[n] iov[4]

Virtual Storage Layer


Flash Memory Summit 2012 Santa Clara, CA

15

Transactional Block Interface


Application issues call to transactional block interface
Write all blocks atomically Trim all blocks atomically

X X
Range[0]

X X X X X X X X X
iov[0] Range[0] Range[1] iov[3] Range[n] iov[4]
Range[n]

Virtual Storage Layer


Flash Memory Summit 2012 Santa Clara, CA

16

Transactional Block Interface


Application issues call to transactional block interface
Write all blocks atomically Trim all blocks atomically Write and Trim atomically

X X
Range[0]

X X
Range[0]

X X X X X X X X X
iov[0] Range[0] Range[1] iov[3] Range[n] iov[4]
Range[n]

X X X
Range[1]

Range[m] Range[n]

Transaction Envelopes Virtual Storage Layer


Flash Memory Summit 2012 Santa Clara, CA

17

Sysbench Performance With Atomic-Write


MySQL extension for Atomic-Write

43%
TRANSACTIONS/SEC INCREASE

2x
ENDURANCE INCREASE

Processor: Xeon X5472 @ 3.00GHz DRAM: 16GB DDR3 4x4GB DIMMs OS: Fedora 14 Linux kernel 2.6.35 Sysbench config: 1 million inserts in 8, 2-million-entry tables, using 16 threads

Flash Memory Summit 2012 Santa Clara, CA

18

Native raw performance comparison

Significantly more functionality with NO additional performance impact

1U HP blade server with 16 GB RAM, 8 CPU cores - Intel(R) Xeon(R) CPU X5472 @ 3.00GHz with single 1.2 TB ioDrive2 mono Flash Memory Summit 2012 Santa Clara, CA

19

Direct I/O Primitives Persistent Trim and Exists


Persistent TRIM (Virtual Address)
Has all the positive properties of TRIM
Improves wear leveling Improves write performance

However well defined with respect to failures


Deterministic return of zeros for read Survives power failures (transactional)

EXISTS (Virtual Address)


Query the existence of a particular element Enables sparse stores with well defined presence semantics
Flash Memory Summit 2012 Santa Clara, CA 20

Conventional Key-Value Store


Conventional KV store Application

KV Store Key -> block mapping (overhead per key) block allocation

Block Read/Write

Metadata, persistence mechanisms, logging, recovery logic etc.

VSL Dynamic provisioning, Block allocation, persistence mechanisms, Logging, recovery etc.

Flash Memory Summit 2012 Santa Clara, CA

21

VSL based Direct Key-Value Store


DirectKey-Value store Application

Key Value API and Library


Fixed zero metadata, leverages VSL Atomic write/delete, coordinated garbage collection

VSL Dynamic provisioning, Block allocation, persistence mechanisms, Logging, recovery etc.

Flash Memory Summit 2012 Santa Clara, CA

22

directKey-Value Store API Overview


KV Store Administration
kv_create() kv_pool_create() kv_pool_delete() kv_open() kv_get_store_info() kv_get_key_info() kv_register_notification_handler() kv_close() kv_destroy() kv_get_pool_info()

KV Store Data Operations


kv_put() kv_batch_put() kv_get() kv_batch_get() kv_delete() kv_delete_all() kv_batch_delete() kv_begin() kv_next() kv_get_current() kv_exists()

September 9, 2012 Fusion-io Confidential

23

Directkey-Value Store Sample Performance


Sample Performance - GET Sample Performance - GET
140000 120000 100000 GETs/s
P T U s/s 120000 100000 80000

Sample Performance - PUT Sample Performance - PUT

80000 60000 40000 20000 0 0 20 40 60 Threads 80 100 120 140

512B 4KB 16KB 64KB

512B 60000 40000 20000 0 0 20 40 60 80 100 120 140 Threads 4KB 16KB 64KB

Significantly more functionality with NO additional performance impact


Performance relative to to ioDrive Performane rela ve ioDrive
120000

Performance relative to ioDrive Performancerela ve to ioDrive


100000 80000 OPS/s 60000 40000 20000

140000 120000 100000 OPS/s 80000 60000 40000 20000 0 0 20 40 Threads 60 80


0 0 10 20 30 40 50 60 70 Threads

512B Key GET 1KB FIO READ

512B Key PUT 1K-FIO WRITE

1U HP blade server with 16 GB RAM, 8 CPU cores - Intel(R) Xeon(R) CPU X5472 @ 3.00GHz with single 1.2 TB ioDrive2 mono Flash Memory Summit 2012 Santa Clara, CA

24

Advantages of Native Flash Access


1. Helps accelerating applications 2. Eliminates redundant functionality 3. Leverages FTL mapping and sparse addressing 4. Optimizes garbage collection 5. Delivers transactional properties 6. Provides direct I/O as well as memory semantics.

Flash Memory Summit 2012 Santa Clara, CA

25

Open Source Enablement and Standardization


MySQL InnoDB extension (GPLv2) Standardization of primitives in T10
Current standards proposal Atomic Writes SBC-4 SPC-5 Atomic-Write http://www.t10.org/cgi-bin/ac.pl?t=d&f=11-229r5.pdf SBC-4 SPC-5 Scattered writes, optionally atomic http://www.t10.org/cgi-bin/ac.pl?t=d&f=12-086r3.pdf SBC-4 SPC-5 Gathered reads, optionally atomic http://www.t10.org/cgi-bin/ac.pl?t=d&f=12-087r3.pdf

Flash Memory Summit 2012 Santa Clara, CA

26

Thank you!

Ashish Batwara Fusion-io abatwara@fusionio.com

Flash Memory Summit 2012 Santa Clara, CA

27

Extended Memory Non-Persistence Path


Fusion-io has developed a subsystem technology called Extended memory, which enables developers to take advantage of NAND Flash memory as an extension of DRAM. The idea is to move frequently accessed data pages to DRAM while rarely accessed data pages are transferred from DRAM to Flash. Thus, the overall available capacity for DRAM can indirectly be increased. Fusion-io said that the technology, which was created in collaboration with Princeton University researchers, allows software developers to simply assume that their entire data set is kept in-memory all the time as NAND is a much more cost-effective memory solution and can reach much greater capacities than DRAM. The Fusion ioMemory architecture is uniquely suited to innovation like the Extended Memory subsystem, said Chris Mason, Fusion-io director of kernel engineering and principal author of the Btrfs file system for Linux, in a prepared statement. Since Fusion ioMemory has moved beyond legacy disk-era protocols, we can integrate new features like the Extended Memory subsystem to truly advance application performance for enterprise computing in ways that are simply not possible with traditional SSDs. Developers can access the Extended Memory feature via Fusion-io's developer community.

http://www.tomshardware.com/news/dram-memory-flash-nand-fusion-io,16254.html?utm_source=dlvr.it&utm_medium=twitter#xtor=RSS-181

Flash Memory Summit 2012 Santa Clara, CA

28

Auto-Commit Memory: Cutting Latency by Eliminating Block I/O


Auto-Commit Memory: Cutting Latency by Eliminating Block I/O Our recent demonstration of one billion IOPS showcased a new paradigm for storing data through Fusion-io Auto-Commit Memory (ACM). ACM isnt just about making NAND Flash storage devices go faster, although it does that too. Its about introducing a much simpler and faster way for an application to guarantee data persistence. When Simplicity Meets Speed For decades, the industry norm for persisting data has been the same an application manipulates data in memory, and when ready to persist the data, packages the data (sometimes called a transaction) for storage. At this point, the application asks the operating system to route the transaction through the kernel block I/O layer. The kernel block layer was built to work with traditional disks. In order to minimize the effect of slow rotational-disk seek times, application I/O is packaged into blocks with sizes matching hard disk sector sizes and sequenced for delivery to the disk drive. As Linux block maintainer (and Fusion-io chief architect) Jens Axboe points out, most real-world I/O patterns are dominated by small, random I/O requests, but are force-fit into 4k block I/Os sequenced by the block layer to match the characteristics of rotating disks. Note the number of steps in this pathway each one contributes to latency. Even more steps are introduced in this pathway when the block storage device is at the other end of a network, behind various bus adaptors, and controllers. As long as memory is volatile, this type of I/O pathway will be the norm. But, what if an application could designate a certain area within its memory space as persistent, and know that data in this area would maintain its state across system reboots? This application would no longer have the burden of following the multi-step block I/O pathway to persist that data. It would no longer need smaller transactions to be packaged into 4k blocks for storage. It would just place selected data meant for persistence in this designated memory area, and then continue using it through normal memory access semantics. If the application or system experienced a failure, the restarted application would find its data persisted in non-volatile memory exactly where it was left. To illustrate, how much faster could real-world databases go if the in-memory tail of their transaction logs had guaranteed persistence without waiting on synchronous block I/O? How much faster could real-world key-value stores (typical in NoSQL databases) go if their indices could be updated in non-volatile memory and not block while waiting on kernel I/O? That is the simplicity of Auto Commit Memory. Itreduces latency by eliminating entire steps in the persistence pathway. Addressing Both Halves of the Problem Block storage benchmarks such as throughput and IOPS are certainly important, but only address half of the problem. The other half of the problem is the work the application and kernel I/O subsystems must do to package and route data for storage. Applications can be accelerated by addressing either or both halves of this problem. However, note that, at some point, the overhead incurred by packaging and routing data through the kernel block storage subsystem will become the bottleneck. Breaking through that barrier was the goal of this technology demonstration. Give applications the software mechanisms to avoid this block storage packaging and routing latency and complexity. Let them spend more time processing data in memory, and less time packaging and waiting for that data to arrive at a block storage destination. Fusion-io does indeed make very fast block I/O devices. What's most exciting about last weeks demonstration for us is looking beyond fast block I/O devices to show what is possible when you address this this fast device as memory rather than block storage.

http://www.fusionio.com/blog/auto-commit-memory-cutting-latency-by-eliminating-block-i/o/
Flash Memory Summit 2012 Santa Clara, CA 29

Você também pode gostar