Você está na página 1de 4

Hbase:

Master: It is one of the daemon in HBase which will perform all the administrative tasks. Below are the
few responsibilities of HMaster.

a) Load balancing: When table become more larger, HMaster will split that table and store in multiple
regions(splitting will happen by dividing no. of row keys in to two equall parts.

b) Region server failures: HMaster will monitor all the region servers in the cluster and if any of them
was failed, immediately master will distribute the region on the failed server to different region servers.

Region server: It is another daemon in HBase cluster which will handle one or more regions. And if client
want to read the data, they can directly reach region servers(no need to contact HMaster) and region
server is responsible to serve the request data.

Region: Region is subset of HBase table data will hold range of row keys.

Row Key: Any table in HBase should be indexed with one column is call row key. If client have not
specified any key while creating the table, HBase will add a row key and add the property called auto
increment. Any HBase table would be distributed to different regions based on the row key. So it is
mandate in any HBase table and will act as a primary key in RDBMS.

Meta store: This is in memory area to store the recent(un committed) write/update operations on
HBase tables. Over the period of time the data in Meta store will be flushed to HFile.
Java Client API: This is an HBase API developed by Apache to interact with HBase tables in Java language.
Generally, the operations are Put, Get, Scan, Disable, Drop, Delete.

Zoo Keeper: Zoo keeper is resource management tool. It is used by HBase to manage the HBase
resources. All the HBase daemons will register in Zoop Keeper.

WAL: Write ahead log file - All the recent modifications on the data in HBase will be maintained in meta
store till some extent. Based on the configurations, the data will be persisted to file system(HFile) when
the data reached to specific size. In between, if something happened to the region server main
memory(Meta store will resides in main memory of region server), then we end up with a situation that
the loose the recent data updates activity. So to avoid this situation, any recent activity will be persisted
to WAL file before storing to Meta store. So that, if crash occurred, then HBase will easily restore the
recent activity from WAL. It is single instance for total HBase cluster and it will not be replicated.

HFile: HBase frame work decides that some data to be persisted to the file system from Meta store, then
that data will be store in to HFiles(again these files will resides in HDFS file system).

META: It is a system table which stores the region servers information. To do any operation on the
HBase table data, client does query the META table to get the list of region servers, holds the requested
table and starting row key in that region server of the region of that table. Client cache the list of region
servers returned from META table to avoid the multiple scans of META table. If region split or
Compaction happened while client performing some task, it will get an exception that data not found
because the META table was updated but the cached information at client is not updated with the new
META information. Zoo Keeper holds the address of the META table in HBase cluster.

Versions: HBase will store the different versions of data for each column. By default HBase keep 3
versions for any specified column. But it is configurable. Any versions of a column is store in sorted order
of the update time stamp. Programmer can configure the no. of version at each column level as well
with the below command.
hbase> alter ‘t1′, NAME => ‘f1′, VERSIONS => 5
Data Deletion: If any explicit deletion happens in HBase, the data really never delete with delete
operation. Instead a tombstone marker is written on that data. This tombstone marker will avoid the
data from read operations so that any deleted data will never be returned in query results.

Compaction: Compaction is a process of merging the multiple smaller HFiles in to single HFile. This
process is classified in to two.

1) Minor Compaction : Usually minor compaction will take smaller no. of smaller adjacent HFiles and
merge them in to a larger HFile. Minor Compaction don't delete or drop the expired versions to avoid
the side effects.

2) Major Compaction: Major compaction will be performed on time base. Unlike Minor compaction,
Major compaction will not happen frequently. In current versions of HBase it will happen once in a
week. But in older versions it happens daily. In Major compaction all the HFiles will be merged in to large
HFile to improve the read performance and unlike Minor Compaction all the deleted data will be
actually deleted from the file while merging.

As the compaction can cause more IO load, The compaction will be happened in HBase with compaction
policy. This policy will use different algorithms to do compaction.

Java API Classes to perform operations on HBase:

Connection Factory
(A non-instantiable class that manages creation of Connections)

Connection
(A cluster connection encapsulating lower level individual connections to
actual servers and a connection to zookeeper. Connections are instantiated
through the ConnectionFactory class. The lifecycle of the connection is
managed by the caller, who has to close() the connection to release the
resources)

Connection Factory: This class has a method which returns the connection object.

Public static Connection createConnection() --> It will return the connection object.

Connection: This class has various methods to perform multiple operations on HBase tables.

1) Admin getAdmin() --> it will return the Admin object and Admin can be used to create, drop, list,
enable and disable tables, add and drop table column families and other administrative operations(DDL
Operations).

2) Table getTable(TableName tbname) --> Will return the table object and Table can be used to get, put,
delete or scan data from a table(DML queries).

3) RegionLocator getRegionLocator(TableName tbname) --> will return the region information of the
given HBase table.
HBase Client Example:

HBaseClientExample.
docx

TableMapReduceUtil.initTableMapperJob(

TableMapReduceUtil.initTableReducerJob(

ImmutableBytesWritable

Você também pode gostar