Você está na página 1de 4

SAP HANA - An Introduction for the beginners

Written by Saurav Mitra


SAP HANA: High-Performance Analytic Appliance (HANA) is an In-Memory Database
from SAP to store data and analyze large volumes of non aggregated transactional data
in Real-time with unprecedented performance ideal for decision support & predictive
analysis.
The In-Memory Computing Engine is a next generation innovation that uses cacheconscious data-structures and algorithms leveraging hardware innovation as well as
SAP software technology innovations. It is ideal for Real-time OLTP and OLAP in one
appliance i.e. E-2-E solution from Transactional to high performance Analytics. SAP
HANA can also be used as a secondary database to accelerate analytics on existing
applications.

Hardware Innovations - Leading to HANA


In real world we have so many variety of data sources, e.g. Unstructured Data,
Operational Data Stores, Data Marts, Data Warehouses, Online Analytical Stores, etc.
To do analytics or information mining from this Big Data at real time we come across the
hurdles like Latency, High Cost and Complexity.
Disk I/O was the Performance bottleneck in the past, whereas in memory computing
was always much faster than that. Earlier, however, the cost of in-memory computing
was prohibitive for any large scale implementation. Now with Multi-Core CPU and high
capacity of RAM, we can host the entire database in memory. So now CPU is waiting
for data to be loaded from main memory into CPU cache - and that's what is the
Performance bottleneck today.
This is a total paradigm shift; Tape is Dead, Disk is Tape, Main Memory is Disk & CPU
Cache is Main Memory. HANA is optimized to exploit the parallel processing capabilities
of modern multi-core/CPU architectures. With this architecture, SAP applications can
benefit from current hardware technologies.

Memory Overview - Where we stand


Let us have a quick look on Multi-Core CPU Caches, Main Memory i.e. RAM &
traditional Hard Disk with respect to response time.

L1 cache - Primary & within core. SRAM - Fastest. L1 cache | ~ 1ns | 64k
L2 cache Intermediate & within core. DRAM - Slower. L2 cache | ~ 5ns | 256k
L3 Cache Shared across all cores. DRAM - Slowest. L3 cache | ~ 20ns | 8M
Main Memory | ~ 100ns | TBs
Hard Disk | > 1.000.000ns | TBs

HANA Hardware Requirement


HANA can be installed on many certified SAP hardware partners: Hewlett Packard,
IBM, Fujitsu Computers, CISCO systems, DELL.
Currently SUSE Linux Enterprise Server x86-64 (SLES) 11 SP1 is the Operating
System supported by SAP HANA.
A typical example of CPU and RAM can be 4 Intel E7-4870 / 40 cores and 512 GB
RAM. SAP recommends a dedicated server network communication of 10 GBit/s
between the SAP HANA landscape and the source system for efficient data replication.

HANA Database Features


Important database features of HANA include OLTP & OLAP capabilities, Extreme
Performance, In-Memory, Massively Parallel Processing, Hybrid Database, Column
Store, Row Store, Complex Event Processing, Calculation Engine, Compression, Virtual
Views, Partitioning and No aggregates. HANA In-Memory Architecture includes the InMemory Computing Engine and In-Memory Computing Studio for modeling and
administration. All the properties need a detailed explanation followed by the SAP
HANA Architecture.

Basic Concepts behind HANA Database


Extreme Hardware Innovations:
Main memory is no-longer a limited resource, modern servers can have 2TB of
system memory and this allows complete databases to be held in RAM. Currently
processors have up to 64 cores, and 128 cores will soon be available. With the
increasing number of cores, CPUs are able to process increased data per time
interval. This shifts the performance bottleneck from disk I/O to the data transfer
between main memory and CPU cache.
In-Memory Database:
HANA fully leverages the hardware innovations like Multi-Core CPU, High capacity
RAM availability. The basic concept is to cache the entire database into fast
accessible Main Memory close to CPU for faster execution and to avoid disk I/O.
Disk storage is still required for permanent persistency since Main Memory is
volatile. SAP HANA, holds the bulk of its data in memory for maximum
performance, but still uses persistent storage to provide a fallback in case of failure.
Data and log are automatically saved to disk at regular save points, the log is also
saved to disk after each COMMIT of a database transaction. Disk write operations
happen asynchronously and as a background task. Generally on system start-up
HANA loads the tables into memory.

Massively Parallel Processing:


With availability of Multi-Core CPUs, higher CPU execution speeds can be
achieved. Multiple CPUs call for new parallel algorithms to be used in databases in
order to fully utilize the computing resources available. HANA Column-based
storage makes it easy to execute operations in parallel using multiple processor
cores. In a column store data is already vertically partitioned. This means that
operations on different columns can easily be processed in parallel. If multiple
columns need to be searched or aggregated, each of these operations can be
assigned to a different processor core. In addition operations on one column can be
parallelized by partitioning the column into multiple sections that can be processed
by different processor cores. With the SAP HANA database, queries can be
executed rapidly and in parallel.
Hybrid Data Store:
Common databases store tabular data row-wise, i.e. all data for a record are stored
adjacent to each other in memory. Row store tables are linked list of memory
pages. Conceptually, a database table is a two-dimensional data structure with cells
organized in rows and columns. Computer memory however is organized as a
linear structure. To store a table in linear memory, two options exist:

A row-oriented storage stores a table as a sequence of records, each of which


contains the fields of one row.
A column-oriented storage stores all the values of a column in contiguous
memory locations.

Use of column store will help to prevent table scan of unnecessary columns while
performing searching and aggregation operations on single column values stored in
contiguous memory locations. Such an oper-ation has high spatial locality and can
efficiently be executed in the CPU cache. With row-oriented storage, the same
operation would be much slower because data of the same column is distributed
across memory and the CPU is slowed down by cache misses. Column store is
optimized for high performance of read operation and efficient data compression.
This combination of both classical and innovative technologies of data storage and
access allows the developer to choose the best technology for their application and,
where necessary, use both in parallel.
OLTP and OLAP Database:
HANA is a hybrid database, having both read optimized column store ideally suited
for OLAP and write optimized row store best for OLTP systems relational engines.
Both the stores are In-Memory. Using column stores in OLTP applications requires
a balanced approach to insertion and indexing of column data to minimize cache
misses. The SAP HANA database allows the developer to specify whether a table

is to be stored column-wise or row-wise. It is also possible to alter an existing table


from columnar to row-based and vice versa.
Higher Data Compression:
The goal of keeping all relevant data in main memory can be achieved with less
cost if data compression is used. Columnar data storage allows highly efficient
compression. If a column is sorted, there will normally be several contiguous values
placed adjacent to each other in memory. In this case compression methods, such
as run-length encoding, cluster coding or dictionary coding can be used. In column
stores a compression factor of 10 can typically be achieved compared to traditional
row-oriented storage systems.

Você também pode gostar