Você está na página 1de 130

TEC215

Technical Deep-Dive in a Column-Oriented In-Memory Database


Prof. Hasso Plattner Stephan Mller
Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Motivation
Prof. Hasso Plattner Stephan Mller
Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

All Areas have to taken into account


Changed Hardware

Advances in data processing (software)

Complex Enterprise Applications

Our focus
3

Why a New Data Management?!


DBMS architecture has not changed over decades Redesign needed to handle the changes in:
Hardware trends (CPU/cache/memory)
Changed workloads Data characteristics/amount
Query engine

Buffer pool

New application requirements


Some academic prototypes: MonetDB, C-store, HyPer, HYRISE Several database vendors picked up the idea and have new databases in place
4

Traditional DBMS Architecture

(e.g., SAP, Vertica, Greenplum, Oracle)

Changes in Hardware
give an opportunity to re-think the assumptions of yesterday because of what is possible today.
Multi-Core Architecture (96 cores per server) One blade ~$50.000 = 1 Enterprise Class Server Parallel scaling across blades A 64 bit address space 2TB in current servers 25GB/s per core Cost-performance ratio rapidly declining Memory hierarchies

Main Memory becomes cheaper and larger


5

In the Meantime Research as come up with


several advance in software for processing data

Column-oriented data organization (the column store)


Sequential scans allow best bandwidth utilization between CPU cores and memory Independence of tuples within columns allows easy partitioning and therefore parallel processing

Lightweight Compression
Reducing data amount, while.. Increasing processing speed through late materialization

And more, e.g., parallel scan/join/aggregation


6

Data Management for Enterprise Applications


Prof. Hasso Plattner Stephan Mller
Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Challenge: Diverse Applications


Transactional Data Entry
Sources: Machines, Transactional Apps, User Interaction, etc.

Real-time Analytics, Structured Data


Sources: Reporting, Classical Analytics, planning, simulation

CPUs (multi-Core + Cache) Main Memory

Event Processing, Stream Data


Sources: machines, sensors, high volume systems

Data Management

Text Analytics, Unstructured Data


Sources: web, social, logs, support systems, etc.

OLTP vs. OLAP


Online Transaction Processing

Online Analytical Processing

Modern enterprise resource planning (ERP) systems are challenged by mixed workloads, including OLAP-style queries. For example:

OLTP-style: create sales order, invoice, accounting documents, display customer master data or sales order OLAP-style: dunning, available-to-promise, cross selling, operational reporting (list open sales orders)

But: Todays data management systems are optimized either for daily transactional or analytical workloads (storing their data along rows or columns)

Drawbacks of any Separation


OLAP/sub system does not have the latest data OLAP/sub system does only have predefined subset of the data

Cost-intensive ETL process has to synch both systems There is a lot of redundancy

Different data schemas introduce complexity for applications


combining sources

10

Workarounds in OTLP
As in OLAP-systems OLTP-systems facilitate redundant data to overcome shortcoming in todays data management
Materialized Views Materialized Aggregates Pre-computed and materialized result sets

Since the database has been the bottleneck, complex data processing is done on application server
Simple SQL statements Nested loop joins (SELECT . SELECT SINGLE ENDSELECT)

Batch-processing lead to
Long running business processes Inflexibility (e.g. ATP rescheduling)
11

Enterprise Applications Have a Specific Database Footprint


Today's Enterprise Applications

Complex processes
Increased data set (but real-world events driven) Separated into transactional (OLTP) and analytical (OLAP) applications Enterprise Data Management Wide schemas Sparse data with limited domain Workload consists of complex analytical queries Workload characteristics Set processing Read access

Insert operations instead of updates


12

Enterprise Data Characteristics


Enterprise data is sparse data: Most tables are empty (~150 important tables) Many columns are not used even once (~50%) Many columns have a low cardinality of values NULL values/default values are dominant Sparse distribution facilitates high compression
Tables that t into category
50000 37500 25000 15553 12500 0 0 1-100 6290 2685 1385 925 579 1M-10M 141 >10M 46418

13

100-1000 1000-10000 10K-100K 100K - 1M Number of Records

Enterprise Data Characteristics


Enterprise data is sparse data: Most tables are empty (~150 important tables) Many columns are not used even once (~50%) Many columns have a low cardinality of values NULL values/default values are dominant Sparse distribution facilitates high compression
80% 70%
% of Columns

78 % 64 %

Inventory Management Financial Accounting

60% 50% 40% 30% 20% 10%

24 % 13 % 9% 33 - 1023 1024 - 100000000 Number of Distinct Values 12 %

14

0%

1 - 32

Enterprise Workloads are Read-Mostly


Enterprise applications have evolved: Not just OLAP vs. OLTP Workload in enterprise applications constitutes of Mainly read queries (OLTP 83%, OLAP 94%) Many queries access large sets of data
100 % 90 % 80 % 70 % 60 % 50 % 40 % 30 % 20 % 10 % 15 0 % Write: Delete Modification Insert Read: Range Select Table Scan Lookup 100 % 90 % 80 % 70 % 60 % 50 % 40 % 30 % 20 % 10 % 0 % Write: Delete Modification Insert Read: Select

Workload

OLTP

OLAP

Workload

TPC-C

Approach
Change overall data management system assumption
In-Memory only Start with read-optimized data structures Transactional features as needed Vertically partitioned (column store) CPU-cache optimized Only one optimization objective main memory access

IN-Memory Column + Row OLTP + OLAP + Text

Column

ROW

TEXT

Rethink how enterprise application persistence is build


Single data management system No redundant data, no materialized views, cubes Computational application logic closer to the database (i.e. complex queries, stored procedures)
16

Backup

In-Memory Data Processing


Prof. Hasso Plattner Stephan Mller
Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Recap: Memory Hierarchy

18

Recap: Latency Numbers


L1 cache reference (cached data word) Branch mispredict L2 cache reference Main memory reference Send 2K bytes over 1 Gbps network SSD random read Read 1 MB sequentially from memory Disk seek Send packet CA Netherlands CA
19

0.5ns 5ns 7ns 100ns 20,000ns 150,000ns 250,000ns 10,000,000ns 150,000,000ns 0.1s 20s 150s 250s 10ms 150ms

In-Memory Data Processing


In DBMS, on disk as well as in memory, data processing is often:
Not CPU bound But bandwidth bound I/O Bottleneck

CPU could process data faster


Memory Access:
Not truly random (in the sense of constant latency) Data is read in blocks/cache lines Even if only parts of a block are requested

Potential waste of bandwidth


20

V1

V2

V3

V4

V5

V6

V7

V8

V9

V10

Cache Line 1

Cache Line 2

Data Layout in Main Memory


Prof. Hasso Plattner Stephan Mller
Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Basics (1)
Memory in todays computers has a linear address layout: addresses start at 0x0 and go to 0xFFFFFFFFFFFFFFFF for 64bit Not every system is 64bit addressable, e.g. modern Intel systems use only 48bits which allows up to 256TB RAM on a single machine Virtual memory allocated by the program can distribute over this space Each UNIX process has its own view on the address space Address translation is done in hardware by the CPU

22

Basics (2)

The memory layout is only linear, every higher-dimensional access is mapped to this linear band.

23

Physical Data Representation


Row store: Rows are stored consecutively Optimal for row-wise access (e.g. SELECT *) Column store: Columns are stored consecutively Optimal for attribute focused access (e.g. SUM, GROUP BY)

Note: concept is independent from storage type


But only in-memory implementation allows fast tuple reconstruction in case of a column store
Row-Store
Row 1 Row 2 Row 3

Column-store
Doc Doc Sold- Value Sales Num Date To Status Org

24

Row 4

Row Data Layout


Data is stored tuple-wise Leverage co-location of attributes for a single tuple Low cost for reconstruction, but higher cost for sequential scan of a single attribute

25

Columnar Data Layout


Data is stored attribute-wise Leverage sequential scan-speed in main memory for predicate evaluation Tuple reconstruction is more expensive

26

Row-oriented storage

A1
A2 A3

B1
B2 B3

C1
C2 C3

A4
27

B4

C4

Row-oriented storage
A1 B1 C1

A2 A3

B2 B3

C2 C3

A4
28

B4

C4

Row-oriented storage
A1 B1 C1 A2 B2 C2

A3

B3

C3

A4
29

B4

C4

Row-oriented storage
A1 B1 C1 A2 B2 C2 A3 B3 C3

A4
30

B4

C4

Row-oriented storage
A1 B1 C1 A2 B2 C2 A3 B3 C3
A4 B4 C4

31

Column-oriented storage

A1
A2 A3

B1
B2 B3

C1
C2 C3

A4
32

B4

C4

Column-oriented storage
A1 A2 A3 A4

B1
B2 B3

C1
C2 C3

B4
33

C4

Column-oriented storage
A1 A2 A3 A4 B1 B2 B3 B4

C1
C2 C3

C4
34

Column-oriented storage
A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4

35

Example: OLTP-Style Query


A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 struct Tuple { int a,b,c; }; Tuple data[4]; fill(data); Tuple third = data[3];

36

Example: OLTP-Style Query


A1 A2 A3 A4 Row Oriented Storage Tuple third = data[3];
1 2

B1 B2 B3 B4

C1 C2 C3 C4

struct Tuple { int a,b,c; }; Tuple data[4]; fill(data);

Column Oriented Storage

Cache line
37

Example: OLAP-Style Query


struct Tuple { int a,b,c; }; A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4

Tuple data[4]; fill(data);


int sum = 0; for(int i = 0;i<4;i++) sum += data[i].a;

38

Example: OLAP-Style Query


struct Tuple { int a,b,c; }; A1 A2 A3 A4 Row Oriented Storage for(int i = 0;i<4;i++) sum += data[i].a;
1 2 3

B1 B2 B3 B4

C1 C2 C3 C4

Tuple data[4]; fill(data);


int sum = 0;

Column Oriented Storage

Cache line
39

Mixed Workloads
Mixed Workloads involve attribute- and entity-focused queries OLTP-style queries
A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4

OLAP-style queries
A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4

40

Mixed Workloads: Choosing the Layout


Layout Row Column OLTP-Misses 2 3 OLAP-Misses 3 1 Mixed 5 4

OLTP-style queries
A1 A2 A3
41

OLAP-style queries
A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4

B1 B2 B3 B4

C1 C2 C3 C4

A4

Dictionary Encoding

Prof. Hasso Plattner Stephan Mller


Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Motivation
Main memory access is the new bottleneck
Idea: Trade CPU time to compress and decompress data

Compression reduces number of I/O operations to main memory


Leads to less cache misses due to more information on a cache line

Operation directly on compressed data Offsetting with bit-encoded fixed-length data types Based on limited value domain

43

Dictionary Encoding Example


8 billion humans Attributes
first name last name gender country city Birthday 200 byte

Each attribute is dictionary encoded


44

Sample Data
rec ID 39 40 41 42 43 fname John Mary Jane John Peter lname Smith Brown Doe Doe Schmidt gender m f f m m city Chicago London Palo Alto Palo Alto Potsdam country USA UK USA USA GER birthday 12.03.1964 12.05.1964 23.04.1976 17.06.1952 11.11.1975

45

Dictionary Encoding a Column


A column is split into a dictionary and an attribute vector Dictionary stores all distinct values with implicit value ID Attribute vector stores value IDs for all entries in the column Position is stored implicitly Enables offsetting with bit-encoded fixed-length data types
Rec ID 39 40 41 fname John Mary Jane Dictionary for fname Value ID 23 24 Value John Mary Attribute Vector for fname position Value ID

39
40 41 42 43

23
24 25 23 26

42
43
46

John
Peter

25
26

Jane
Peter

Querying Data using Dictionaries


Search for Attribute Value
Search Value IDs for requested value in Dictionary Scan Attribute Vector for Value ID Replace Value IDs in result with corresponding dictionary value
Dictionary for fname Value ID 23 24 Value John Mary Attribute Vector for fname position Value ID

39
40 41 42 43

23
24 25 23 26

25
26
47

Jane
Peter

Sorted Dictionary
Dictionary entries are sorted either by their numeric value or lexicographically
Dictionary lookup complexity: O(log(n)) instead of O(n)

Dictionary entries can be compressed to reduce the amount of required storage Selection criteria with ranges are less expensive (orderpreserving dictionary)

48

Compression Rate
Depends on cardinality / entropy Cardinality
Table cardinality: number of tuples in a relation Column cardinality: number of distinct values in a column

Entropy
measure for information density Entropy = column cardinality / table cardinality

49

Data Size Examples


Column First names Last names Gender City Country Birthday Column Cardinality 5 million 23 bit 8 million 23 bit 2 1 bit 1 million 20 bit 200 8 bit 40,000 16 bit Entropy 6.25 10-4 1 10-3 2.5 10-10 1.25 10-4 2.5 10-8 5 10-6 Item Size 49 Byte 50 Byte 1 Byte 49 Byte 49 Byte 2 Byte Plain Size 365.10 GB 373,840 MB 372.5 GB 381,470 MB 7.45 GB 7,630 MB 365.08 GB 373,840 MB 365.08 GB 373,840 MB 14.90 GB 15,260 MB Size with Dictionary (Dictionary + Column) 234 MB + 21.42 GB 22,168 MB 381 MB + 21.42 GB 22,316 MB 2 Byte + 0.93 GB 954 MB 46.73 MB + 18.62 GB 19,120 MB 6.09 KB + 7.45 GB 7,629 MB 76.29 KB + 14.90 GB 15,259 MB

50

In-Memory Database Operations


Prof. Hasso Plattner Stephan Mller
Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Scan Performance (1)


8 billion humans

Attributes
First Name Last Name Gender Country City Birthday 200 byte


52

Question: How many men/women? Assumed scan speed: 2MB/ms/core

Scan Performance (2)


Row Store Layout

Table size = 8 billion tuples x 200 bytes per tuple 1.6 TB Scan through all rows with 2MB/ms/core 800 seconds with 1 core

53

Scan Performance (3)


Row Store Full Table Scan

Table size = 8 billion tuples x 200 bytes per tuple 1.6 TB Scan through all rows with 2MB/ms/core 800 seconds with 1 core

54

Scan Performance (4)


Row Store Stride Access Gender

8 billion cache accesses 64 byte 512 GB

Read with 2MB/ms/core 256 seconds with 1 core

55

Scan Performance (5)


Column Store Layout

Table size

Attribute vectors: 91 GB Dictionaries: 700 MB

Total: 92 GB

Compression factor: 17

56

Scan Performance (6)


Column Store Full Column Scan on Gender

Size of attribute vector gender = 8 billion tuples x 1 bit per tuple 1 GB Scan through attribute vector with 2MB/ms/core 0.5 seconds with 1 core

57

Scan Performance (7)


Column Store Full Column Scan on Birthday

Size of attribute vector birthday = 8 billion tuples x 2 byte per tuple = 16 GB Scan through column with 2MB/ms/core 8 seconds with 1 core

58

Scan Performance Summary

How many women, how many men?

Column Store

Row Store

Full table scan


Time in seconds 0.5 800 1,600x slower
59

Stride access
256 512x slower

SELECT Example
SELECT first_name, last_name FROM world_population WHERE country=IT and gender=m id 2394 3010 3040 fname Gianluigi Lena Mario lname Buffon Gercke Balotelli country Italy Germany Italy gender m f m

3949 4902 20102

Manuel Lukas Klaas-Jan

Neuer Podolski Huntelaar

Germany Germany Netherlands

m m m

60

Query Plan
s s

Predicates are evaluated and generate position lists Intermediate position lists are logically combined Final position list is used for materialization
61

Query Execution
Value ID Dictionary for country Algeria France Germany Italy Netherlands id 2394 3010 3040 3949 4902 20102 fname Gianluigi Lena Mario Manuel Lukas Klaas-Jan lname Buffon Gercke Balotelli Neuer Podolski Huntelaar country 3 2 3 2 2 4 gender 1 0 1 1 1 1 1 m Value ID 0 Dictionary for gender f 0 1 2 3 4

gender = 1 (m)
Position 0

country = 3 (Italy)
Position

2
3 4

0
3

AND
Position 0 3
62

Tuple Reconstruction
Prof. Hasso Plattner Stephan Mller
Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Tuple Reconstruction #1
Accessing a record in a row store

All attributes are stored consecutively 200 byte 4 cache accesses 64 byte 256 byte
Read with 2MB/ms/core 0.128 microseconds with 1 core

64

Tuple Reconstruction #2
Virtual Record IDs

All attributes are stored in separate columns Implicit record IDs are used to reconstruct rows

65

Tuple Reconstruction #3
Virtual Record IDs

1 cache access for each attribute


6 cache accesses 64 byte 384 byte Read with 2MB/ms/core 0.192 microseconds with 1 core

66

Insert

Prof. Hasso Plattner Stephan Mller


Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Example
world_population
rowID fname lname gender country city birthday

0
1 2 3 4 ...

Martin
Michael Hanna Anton Sophie ...

Albrecht
Berg Schulze Meyer Schulze ...

m
m f m f ...

GER
GER GER AUT GER ...

Berlin
Berlin Hamburg Innsbruck Potsdam ...

08-05-1955
03-05-1970 04-04-1968 10-20-1992 09-03-1977 ...

INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)

68

INSERT (1) w/o new Dictionary entry


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 0 1 0 1 2 3

D
Albrecht Berg Meyer Schulze
0 1 2 3 4 ... fname Martin Michael Hanna Anton Sophie ... lname Albrecht Berg Schulze Meyer Schulze ... gender m m f m f ... country GER GER GER AUT GER ... city Berlin Berlin Hamburg Innsbruc k Potsdam ... birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977 ...

2
3 4

3
2 3

Attribute Vector (AV) Dictionary (D)

69

INSERT (1) w/o new Dictionary entry


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 0 1 0 1

D
Albrecht Berg
0 1 2 3 4 ... fname Martin Michael Hanna Anton Sophie ... lname Albrecht Berg Schulze Meyer Schulze ... gender m m f m f ... country GER GER GER AUT GER ... city Berlin Berlin Hamburg Innsbruc k Potsdam ... birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977 ...

2
3 4

3
2 3

2
3

Meyer
Schulze

1.

Look-up on D entry found

Attribute Vector (AV) Dictionary (D)

70

INSERT (1) w/o new Dictionary entry


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 0 1 0 1

D
Albrecht Berg
0 1 2 3 4 5 fname Martin Michael Hanna Anton Sophie lname Albrecht Berg Schulze Meyer Schulze Schulze ... ... ... ... gender m m f m f country GER GER GER AUT GER city Berlin Berlin Hamburg Innsbruc k Potsdam birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977

2
3 4 5

3
2 3 3

2
3

Meyer
Schulze

1.
2.

Look-up on D entry found Append ValueID to AV

...

Attribute Vector (AV)

...

...

Dictionary (D)

71

INSERT (2) with new Dictionary Entry I/II


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 0 0 0 1

D
Berlin Hamburg
0 1 2 3 4 5 ... ... fname Martin Michael Hanna Anton Sophie lname Albrecht Berg Schulze Meyer Schulze Schulze ... ... ... ... ... gender m m f m f country GER GER GER AUT GER city Berlin Berlin Hamburg Innsbruck Potsdam birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977

2
3 4

1
2 3

2
3

Innsbruck
Potsdam

Attribute Vector (AV)

Dictionary (D)

72

INSERT (2) with new Dictionary Entry I/II


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 0 0 0 1

D
Berlin Hamburg
0 1 2 3 4 5 fname Martin Michael Hanna Anton Sophie lname Albrecht Berg Schulze Meyer Schulze Schulze ... ... ... ... ... ... gender m m f m f country GER GER GER AUT GER city Berlin Berlin Hamburg Innsbruck Potsdam birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977

2
3 4

1
2 3

2
3

Innsbruck
Potsdam

1.

Look-up on D no entry found

...

Attribute Vector (AV)

Dictionary (D)

73

INSERT (2) with new Dictionary Entry I/II


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 0 0 0 1

D
Berlin Hamburg
0 1 2 3 4 5 ... ... fname Martin Michael Hanna Anton Sophie lname Albrecht Berg Schulze Meyer Schulze Schulze ... ... ... ... ... gender m m f m f country GER GER GER AUT GER city Berlin Berlin Hamburg Innsbruck Potsdam birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977

2
3 4

1
2 3

2
3 4

Innsbruck
Potsdam Rostock

1.
2.

Look-up on D no entry found Append new value to D (no re-sorting necessary)

Attribute Vector (AV)

Dictionary (D)

74

INSERT (2) with new Dictionary Entry I/II


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 0 0 0 1

D
Berlin Hamburg
0 1 2 3 4 5 ... ... fname Martin Michael Hanna Anton Sophie lname Albrecht Berg Schulze Meyer Schulze Schulze ... ... ... gender m m f m f country GER GER GER AUT GER city Berlin Berlin Hamburg Innsbruck Potsdam Rostock ... ... birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977

2
3 4 5

1
2 3 4

2
3 4

Innsbruck
Potsdam Rostock

1.
2. 3.

Look-up on D no entry found Append new value to D (no re-sorting necessary) Append ValueID to AV

Attribute Vector (AV)

Dictionary (D)

75

INSERT (2) with new Dictionary Entry I/II


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 2 3

D
0
1 2 3 4

Anton
Hanna Martin Michael Sophie
0 1 2 3 4 5 ...

fname Martin Michael Hanna Anton Sophie

lname Albrecht Berg Schulze Meyer Schulze Schulze

gender m m f m f

country GER GER GER AUT GER

city Berlin Berlin Hamburg Innsbruck Potsdam Rostock

birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977

2
3 4

1
0 4

...

...

...

...

...

...

Attribute Vector (AV)

Dictionary (D)

76

INSERT (2) with new Dictionary Entry II/II


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 2 3

D
0
1 2 3 4

Anton
Hanna Martin Michael Sophie
0 1 2 3 4 5

fname Martin Michael Hanna Anton Sophie

lname Albrecht Berg Schulze Meyer Schulze Schulze

gender m m f m f

country GER GER GER AUT GER

city Berlin Berlin Hamburg Innsbruck Potsdam Rostock

birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977

2
3 4

1
0 4

1.

Look-up on D no entry found

...

...

...

...

...

...

...

Attribute Vector (AV)

Dictionary (D)

77

INSERT (2) with new Dictionary Entry II/II


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 2 3

D
0
1 2 3 4 5

Anton
Hanna Martin Michael Sophie Karen
0 1 2 3 4 5 ...

fname Martin Michael Hanna Anton Sophie

lname Albrecht Berg Schulze Meyer Schulze Schulze

gender m m f m f

country GER GER GER AUT GER

city Berlin Berlin Hamburg Innsbruck Potsdam Rostock

birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977

2
3 4

1
0 4

1.
2.

Look-up on D no entry found Append new value to D

...

...

...

...

...

...

Attribute Vector (AV)

Dictionary (D)

78

INSERT (2) with new Dictionary Entry II/II


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 2 3

D (old)
0
1 2 3 4 5

D (new)
0 1 2 3 Anton Hanna Karen Martin
0 1 2 3 4 5 ... ... fname Martin Michael Hanna Anton Sophie lname Albrecht Berg Schulze Meyer Schulze Schulze ... ... ... gender m m f m f country GER GER GER AUT GER city Berlin Berlin Hamburg Innsbruck Potsdam Rostock ... ... birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977

Anton
Hanna Martin Michael Sophie Karen

2
3 4

1
0 4

4
5

Michael
Sophie

1.
2. 3.

Look-up on D no entry found Append new value to D Sort D

Attribute Vector (AV)

Dictionary (D)

79

INSERT (2) with new Dictionary Entry II/II


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV (old)
0 1 2 3

AV (new)
0 1 3 4 0 1 2 3

D (new)
Anton Hanna Karen Martin
0 1 2 3 4 5 ... ... fname Martin Michael Hanna Anton Sophie lname Albrecht Berg Schulze Meyer Schulze Schulze ... ... ... gender m m f m f country GER GER GER AUT GER city Berlin Berlin Hamburg Innsbruck Potsdam Rostock ... ... birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977

2
3 4

1
0 4

2
3 4

1
0 5

4
5

Michael
Sophie

1.
2. 3. 4.

Look-up on D no entry found Append new value to D Sort D Change ValueIDs in AV

Attribute Vector (AV)

Dictionary (D)

80

INSERT (2) with new Dictionary Entry II/II


INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)
AV
0 1 3 4

D
0
1 2 3 4 5

Anton
Hanna Karen Martin Michael Sophie
0 1 2 3 4 5 ...

fname Martin Michael Hanna Anton Sophie Karen ...

lname Albrecht Berg Schulze Meyer Schulze Schulze ...

gender m m f m f

country GER GER GER AUT GER

city Berlin Berlin Hamburg Innsbruck Potsdam Rostock

birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 09-03-1977

2
3 4 5

1
0 5 2

1.
2. 3. 4. 5.

Look-up on D no entry found Append new value to D Sort D Change ValueIDs in AV Append new ValueID to AV

...

...

...

...

Attribute Vector (AV)

Dictionary (D)

81

RESULT
world_population
rowID fname lname gender country city birthday

0
1 2 3 4 5 ...

Martin
Michael Hanna Anton Ulrike Karen ...

Albrecht
Berg Schulze Meyer Schulze Schulze ...

m
m f m f f ...

GER
GER GER AUT GER GER ...

Berlin
Berlin Hamburg Innsbruck Potsdam Rostock ...

08-05-1955
03-05-1970 04-04-1968 10-20-1992 09-03-1977 06-20-2012 ...

INSERT INTO world_population VALUES (Karen, Schulze, w, GER, Rostock, 06-20-2012)

82

Insert-Only

Prof. Hasso Plattner Stephan Mller


Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Facts about Insert-Only

Principles

Never delete any data Invalidate outdated tuples instead Logical Update is changed to an technical Insert/Append Gap-less time travel possible, Legal requirements, e.g. for auditability can easily be met Implicit logging Snapshot Isolation and locking is simplified but applied compression reduces overhead

Advantages

Disadvantage: Increased memory consumption

84

Implementation possibilities
1.

Point Representation

Store complete tuple on attribute change Save insert timestamp in column valid_from Writes faster, reads slower Store complete tuple on attribute change Update replaced tuple, store current timestamp in valid_to Same timestamp is stored into valid_from in new tuple Reads faster, writes slower Update existing tuple on changes (not insert-only any longer) Store outdated values in separate history table

2.

Interval representation

3.

History reconstruction on demand


85

Status updates can be done in-place with timestamps Timestamps are not compressed

Insert-Only 1
Point Representation

id 0 1 2 3

Introduce one additional column: valid_from


fname Martin Michael Hanna Anton lname Albrecht Berg Schulze Meyer gender m m f m country GER GER GER AUT city Berlin Berlin Hamburg Innsbruck birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 valid_from 10-11-2011 10-11-2011 10-11-2011 10-11-2011

4
5 1 ...

Ulrike
Sophie Michael ...

Schulze
Schulze Berg ... Perdopolus

f
f m ... m

GER
GER GER ... GRE

Potsdam
Rostock Potsdam ... Athen

09-03-1977
06-20-2012 03-05-1970 ... 03-12-1979

10-11-2011
10-11-2011 07-02-2012 ... 10-11-2011

8 10 9 Zacharias

86

Primary key is composed of id and valid_from Insert is easy: valid_from = current timestamp Select, Group By, Join: requires nested join to determine the valid_from timestamp for each object

Insert-Only 2
Interval Representation

id 0 1 2 3

Introduce 2 additional columns: valid_from and valid_to


fname Martin Michael Hanna Anton lname Albrecht Berg Schulze Meyer gender m m f m country GER GER GER AUT city Berlin Berlin Hamburg Innsbruck birthday 08-05-1955 03-05-1970 04-04-1968 10-20-1992 valid_from 10-11-2011 10-11-2011 10-11-2011 10-11-2011 07-02-2012 valid_to

4
5 1 ...

Ulrike
Sophie Michael ...

Schulze
Schulze Berg ... Perdopolus

f
f m ... m

GER
GER GER ... GRE

Potsdam
Rostock Potsdam ... Athen

09-03-1977
06-20-2012 03-05-1970 ... 03-12-1979

10-11-2011
10-11-2011 07-02-2012 ... 10-11-2011 ...

8 10 9 Zacharias


87

Primary key is composed of id and valid_from Insert requires update of the formerly current tuple Select, Group By, Join is easy: Where clause eliminates tuples out of range Finding up-to-date entries can be supported by an additional bit-vector on column valid_to

Snapshot Isolation

Snapshot Isolation guarantees consistent reads during a transaction, all reads retrieve the values that were active in the moment the transaction started Conflicts like lost updates may happen theoretically, but are prevented through

Pre-write checks in the database or Application-level locks


transaction2 wants to insert.
transaction1 read1 read2 insert1 insert2 transaction2

An alert is raised since the read values are no longer valid.

time

88

Status Updates
When updates of status fields are changed by replacement, do we need to insert a new version of the tuple? Insert Only would lead to an overhead (e.g. clearing in FI) Most status fields are binary Uncompressed in-place updates with row timestamp

t = NULL

t = 2009/06/30

Unpaid
89

Paid

Handling Data Modifications

Prof. Hasso Plattner Stephan Mller


Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Motivation
Inserting new tuples directly into a compressed structure can be expensive
New values can require reorganizing the dictionary Number of bits required to encode all dictionary values can change, attribute vector has to be reorganized

Deletion of tuples is expensive


All attribute vectors have to be reorganized, value IDs of following tuples have to be moved

91

Differential Buffer

New values are written to a dedicated differential buffer (Delta) Cache Sensitive B+ Tree (CSB+) used for faster search on Delta
Read
world_population
fname
Attribute Vector
0 0 1 1 0 1 2 3

Write

fname
Dictionary
Anton Hanna Michael Sophie

Attribute Vector
0 1 2 3 0 1 1 2

Dictionary

(compressed)

1 2

0
1 2

Angela
Klaus Andre

CSB+

3
4 5

3
2 1

8 Billion entries 92

up to 50,000 entries

Main Store

Differential Buffer/ Delta

Differential Buffer
Inserts of new values are faster, because dictionary and attribute vector does not need to be resorted Range select on differential buffer expensive, based on unsorted dictionary Differential Buffer requires more memory:
no attribute vector compression additional CSB+ Tree for dictionary

93

Tuple Lifetime
Michael moves from Berlin to Potsdam Main Table: world_population
recId fname lname gender country city birthday

0
1 2 3 4 5 ... 8 * 109

Martin
Michael Hanna Anton Ulrike Sophie

Albrecht
Berg Schulze Meyer Schulze Schulze

m
m f m f f

GER
GER GER AUT GER GER

Berlin
Berlin Hamburg Innsbruck Potsdam Rostock

08-051955
03-051970 04-041968 10-201992 09-031977 06-202012 ... 03-121979

Main Store

Differential Buffer

... ... ... ... ... UPDATE world_population Zacharias Perdopolus m GRE Athen SET city=Potsdam WHERE fname= Michael AND lname=Berg

94

Tuple Lifetime
Michael moves from Berlin to Potsdam Main Table: world_population
recId fname lname gender country city birthday

0
1 2 3 4 5 ... 8 * 109

Martin
Michael Hanna Anton Ulrike Sophie

Albrecht
Berg Schulze Meyer Schulze Schulze

m
m f m f f

GER
GER GER AUT GER GER

Berlin
Berlin Hamburg Innsbruck Potsdam Rostock

08-051955
03-051970 04-041968 10-201992 09-031977 06-202012 ... 03-121979

Main Store

Differential Buffer

... ... ... ... ... UPDATE world_population Zacharias Perdopolus m GRE Athen SET city=Potsdam WHERE fname= Michael AND lname=Berg

95

Tuple Lifetime
Michael moves from Berlin to Potsdam Main Table: world_population
recId fname lname gender country city birthday

0
1 2 3 4 5 0 ... 8 * 109

Martin
Michael Hanna Anton Ulrike Sophie Michael

Albrecht
Berg Schulze Meyer Schulze Schulze Berg

m
m f m f f m

GER
GER GER AUT GER GER

Berlin
Berlin Hamburg Innsbruck Potsdam Rostock Potsdam

08-051955
03-051970 04-041968 10-201992 09-031977 06-2003-052012 1970 ... 03-121979

Main Store

Differential Buffer

... ... ... ... ... UPDATE world_population Zacharias Perdopolus m GRE Athen SET city=Potsdam WHERE fname= Michael AND lname=Berg

96

Tuple Lifetime
Problem: Tuples are now available in Main Store and Differential Buffer Tuples of a table are marked by a validity vector to reduce the required amount of reorganization steps
Like an attribute vector for validity

Invalidated tuples stay in the database table, until the next reorganization takes place Search results are reduced using the validity vector 1 bit required per database tuple

97

Tuple Lifetime
Michael moves from Berlin to Potsdam Main Table: world_population
recId fname lname gender country city birthday valid

Main Store

0
1 2 3 4 5 0 ... 8 * 109

Martin
Michael Hanna Anton Ulrike Sophie Michael

Albrecht
Berg Schulze Meyer Schulze Schulze Berg

m
m f m f f m

GER
GER GER AUT GER GER

Berlin
Berlin Hamburg Innsbruck Potsdam Rostock Potsdam

08-051955
03-051970 04-041968 10-201992 09-031977 06-2003-052012 1970 ... 03-121979

1
0 1 1 1 1

... ... ... ... ... UPDATE world_population Zacharias Perdopolus m GRE Athen SET city=Potsdam WHERE fname= Michael AND lname=Berg

Differential Buffer

98

Tuple Lifetime
Michael moves from Berlin to Potsdam Main Table: world_population
recId fname lname gender country city birthday valid

Main Store

0
1 2 3 4 5 0 ... 8 * 109

Martin
Michael Hanna Anton Ulrike Sophie Michael

Albrecht
Berg Schulze Meyer Schulze Schulze Berg

m
m f m f f m

GER
GER GER AUT GER GER

Berlin
Berlin Hamburg Innsbruck Potsdam Rostock Potsdam

08-051955
03-051970 04-041968 10-201992 09-031977 06-2003-052012 1970 ... 03-121979

1
0 1 1 1 1

... ... ... ... ... UPDATE world_population Zacharias Perdopolus m GRE Athen SET city=Potsdam WHERE fname= Michael AND lname=Berg

Differential Buffer

99

Stored Procedures

Prof. Hasso Plattner Stephan Mller


Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Facts about Stored Procedures


Basically a procedural program stored in the database Written in a (vendor-) specific language (i.e. PL/SQL, Java, ) Usually support constructs like loops, conditions, i.e. Takes parameters as input Returns a result set Main usage: set operations that cannot be expressed in SQL (or are very hard to express) Additional usages:
Access control Data validation Data conversion

101

Advantages
Performance
No data transfer between database and application server

Code reduction
Share stored procedure written in a generic way Less code in the application layer

Improved security
No SQL injection possible in stored procedures

102

Implications on Application Development


Prof. Hasso Plattner Stephan Mller
Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

How does it all come together?


1. Mixed Workload combining OLTP and analytic-style queries
Column-Stores are best suited for analytic-style queries In-memory database enables fast tuple re-construction In-memory column store allows aggregation on-the-fly

2. Sparse enterprise data


Lightweight compression schemes are optimal Increases query execution Improves feasibility of in-memory database
Changed Hardware Advances in data processing (software)

3. Mostly read workload


Read-optimized stores provide best throughput i.e. compressed in-memory column-store Write-optimized store as delta partition to handle data changes is sufficient
104

Complex Enterprise Applications

Our focus

An In-Memory Database for Enterprise Applications


In-Memory Database (IMDB)
Data resides permanently in main memory Main Memory is the primary persistence Still: logging to disk/recovery from disk Main memory access is the new bottleneck Cache-conscious algorithms/ data structures are crucial (locality is king)
Interface Services and Session Management
Query Execution Metadata TA Manager

Distribution Layer at Blade i Main Memory at Blade i

Active Data Main Store


Combined Column

Differential Store
Combined Column Column Column

Indexes
Inverted

Column

Column

Merge

Object Data Guide

Data aging

Time travel

Logging

Recovery

Log
105

Non-Volatile Memory

Passive Data (History)

Snapshots

Simplified Application Development


Traditional Application cache Database cache Column-oriented

Fewer caches necessary

No redundant data (OLAP/OLTP, LiveCache)


No maintenance of materialized views or aggregates Minimal index maintenance

Prebuilt aggregates

Raw data

106

Enterprise Application PoCs


Prof. Hasso Plattner Stephan Mller
Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

SAP ERP Financials on In-Memory Technology


In-memory column database for an ERP system

Combined workload (parallel OLTP/OLAP queries) Leverage in-memory capabilities to


Reduce amount of data
Aggregate on-the-fly Run analytic-style queries (to replace materialized views)

Execute stored procedures

Use Case: SAP ERP Financials solution


Post and change documents

Display open items


Run dunning job
108

Analytical queries, such as balance sheet

Current Financials Solutions

109

The Target Financials Solution


Only base tables, algorithms, and some indexes

110

Feasibility of Financials on InMemory Technology in 2009


Modifications on SAP Financials
Removed secondary indices, sum tables and pre-calculated and materialized tables Reduce code complexity and simplify locks Insert Only to enable history (change document replacement)

Added stored procedures with business functionality

European division of a retailer


ERP 2005 ECC 6.0 EhP3 5.5 TB system database size Financials: 23 million headers / 8 GB in main memory 252 million items / 50 GB in main memory
(including inverted indices for join attributes and insert only extension)

111

In-Memory Financials on SAP ERP


accounting documents

dunning data

sum tables

BKPF BSEG

secondary indices

MHNK MHND
change documents

BSAD
BSAK BSAS BSID BSIK BSIS

GLT0

CDHDR

CDPOS

LFC1

KNC1

112

In-Memory Financials on SAP ERP


accounting documents

BKPF BSEG

113

Reduction by a Factor 10
DBMS BKPF BSEG 8.7 GB 255 GB 263.7 GB Secondary Indices Sum Tables Complete
114

IMDB 1.5 GB 50 GB 51.5 GB

255 GB 0.55 GB 519.25 GB

51.5 GB

Booking an accounting document


Insert into BKPF and BSEG only Lack of updates reduces locks

115

Dunning Run
Dunning run determines all open and due invoices Customer defined queries on 250M records Current system: 20 min New logic: 1.5 sec
In-memory column store Parallelized stored procedures Simplified Financials

116

Why?
Being able to perform the dunning run in such a short time lowers TCO Add more functionality! Run other jobs in the meantime! - in a multi-tenancy cloud setup hardware must be used wisely

117

Bring Application Logic Closer to the Storage Layer


Select accounts to be dunned, for each:
Select open account items from BSID, for each: Calculate due date Select dunning procedure, level and area Create MHNK entries

Create and write dunning item tables

118

Bring Application Logic Closer to the Storage Layer


Select accounts to be dunned, for each:
1 SELECT
Select open account items from BSID, for each: 10000 SELECTs Calculate due date 10000 SELECTs Select dunning procedure, level and area Create MHNK entries

Create and write dunning item tables

31000 Entries

119

Bring Application Logic Closer to the Storage Layer


One single stored 1 SELECT Select accounts to be dunned, for each: procedure Select open account items from BSID, for each: 10000 SELECTs executed Calculate due date within newDB
Select dunning procedure, level and area Create MHNK entries

10000 SELECTs 31000 Entries

Create and write dunning item tables

120

Bring Application Logic Closer to the Storage Layer


One single stored Select accounts to be dunned, for each: procedure Select open account items from BSID, for each: executed Calculate due date within newDB
Select dunning procedure, level and area Create MHNK entries

Create and write dunning item tables

Calculated onthe-fly

121

Factor: 800x Acceleration


Quantity: 250 mio items, 380k open, 200k due
# 1 Operation Select open items Due date, dunning level Filter 1 (verify dunning levels) Filter 2 (check last dunning) Generate MHNK (aggregate) Generate MHND (execute filters) Original Version 1 0.63s Variant 2 1.01s (incl. T047 & KNB5 Join) Deferred to aggregation Variant 3 0.6s (incl. T047 & KNB5 Join) 0.5s

27s

~19s

1.1s

0.5s

Additional Information Hardware: 4 CPUs x 6 cores, 256 GB RAM

~15s

0.8s

0.4s

done in #1

1.2s

Done in #1

done in #1

140ms

Done in #1

122

Total

~20 minutes

~1 minute

~3.0s
(#3, #4 exec. in parallel)

~1.5s
(#2, #3, #4 exec. in parallel)

Dunning Application

123

Dunning Application

124

Available-to-Promise Check
Can I get enough quantities of a requested product on a desired delivery date? Goal: Analyze and validate the potential of in-memory and highly parallel data processing for Available-to-Promise (ATP) Challenges
Dynamic aggregation Instant rescheduling in minutes vs. nightly batch runs Real-time and historical analytics

Outcome
Real-time ATP checks without materialized views Ad-hoc rescheduling No materialized aggregates

125

In-Memory Available-toPromise

126

Demand Planning
Flexible analysis of demand planning data Zooming to choose granularity Filter by certain products or customers Browse through time spans Combination of location-based geo data with planning data in an in-memory database External factors such as the temperature, or the level of cloudiness can be overlaid to incorporate them in planning decisions 127

GORFID
HANA for Streaming Data Processing Use Case: In-Memory RFID Data Management Evaluation of SAP OER Prototypical implementation of:
RFID Read Event Repository on HANA Discovery Service on HANA (10 Billion data records with ca. 3 seconds response time) Frontends for iPhone, iPad2

Key Findings:
HANA is suited for streaming data (using bulk inserts) Analytics on streaming data is now possible

128

GORFID: Near Real-Time as a Concept

Bulk load every 2-3 seconds: > 50,000 inserts/s

129

Thanks!
Questions?

Online Lecture (starting Sept 3rd) http://openhpi.de

Stephan Mller Hasso Plattner Institute stephan.mueller@hpi.uni-potsdam.de

Você também pode gostar