Você está na página 1de 20

Impliance: A Next

Generation Information
Management Appliance

Submitted by:
Mayur Kulshreshtha – 09609159
Mayank Talwar – 09609042
Monika Rathour-- 09609125
Tuesday, December 07, 2021 IMPLIANCES 1
Abstract
 Though database technology has been remarkably
successful in building a large market but its impact
on the broader market of information management
is surprisingly limited.

 We introduce Impliance, a next-generation


information management system consisting of
hardware and software components integrated to
form an easy-to-administer appliance that can store,
retrieve, and analyse all types of structured, semi-
structured, and unstructured information.
12/07/2021 IMPLIANCES 2
Some Questions
Why is most (>80%) of the world’s data still not in databases

Didn’t we “solve” this problem in the 1980s with object-relational systems?

Do you use a database to store your data on your laptop?

Why not? (You are a database bigot, aren’t you?)

Have you ever tried to query (with SQL) a database that:


You didn’t create, and…
Had more than 500 tables?

Just how easy is it to incrementally add DB capacity beyond 1 machine? Can we


do it with 100 machines?

Have “self-managing” databases significantly simplified administration?


12/07/2021 IMPLIANCES 3
Introduction
Though database architectures have endured several orders of
magnitude changes in the relative speeds of hardware over the last
30 years, the many tiers and layers of components in today’s
information management systems are exceedingly complicated.

It is time to re-examine not only the architecture of database


systems as we know them today, but also their place in the overall
stack of software employed in modern data-intensive applications.

This process has resulted in a radical new architecture that integrates


software and hardware into a high-function and easy-to-manage
information management appliance that we call Impliance, currently
being designed and prototyped at the IBM Almaden Research Center.

12/07/2021 IMPLIANCES 4
IBM progressing for SMARTER INTELLIGENCE

12/07/2021 IMPLIANCES 5
Impliances
Impliance is an ambitious, long-term effort to define simpler,
more robust, and more scalable information systems for
tomorrow’s enterprises.

The goal of the Impliance project is to build a next-generation


information management system that stores all structured,
unstructured, and semi-structured data, is easy to manage, and
analyses data in a scalable way.

Impliance will be capable of storing, retrieving, and analysing all


these structured, semi-structured, and unstructured information,
with low total cost of ownership (TCO), and supporting the needs
of small businesses to those of the largest global enterprises.
12/07/2021 IMPLIANCES 6
Requirements For Impliance
Not just structured homogeneous tables – need to be managed uniformly by
tomorrow’s information management systems but structured, semi-structured,
and unstructured information are required to manage huge amounts of
information of various data types (PDF, XML, Text, Audio, and Video).

Customers want systems that can seamlessly and scalably expand as an


enterprise and its systems’ needs grow.

Customers want systems that are easy to install, deploy, use, and manage, and
that minimize the need for human intervention.

Current information management software products are largely based on the


hardware architectures of several decades ago. While those products have shown
remarkable resiliency to order-of-magnitude changes in hardware speeds and
capacities, re-examining the software architecture in light of tomorrow’s
hardware seems overdue.
12/07/2021 IMPLIANCES 7
Use Of Impliances For Problem Solving

Exploiting Customer Relationship Management

Integrating Content and Data

Legal Compliance

12/07/2021 IMPLIANCES 8
Functionality Overview
We have identified four major areas of functionality that span information
management and that Impliance needs to unify:

SEMANTICS:Even tagged data such as XML may not have sufficient semantic
information behind the tags to properly relate it, e.g. the units of measurement, how
that data was collected

SEARCH/QUERY:The more semantic information we can extract from the data, the
more we can improve the utility of this search.

COMPOSITION: Composition includes integrating items from heterogeneous data


“silos” to form higher-level objects, or “quick and dirty” mash-ups that beneficially
merge public web data sources with each other or with enterprise data sources.

AGGERGATION: Aggregation is fundamental to today’s Business Intelligence and On-


Line Analytic Processing (OLAP), data mining, and visualization.
12/07/2021 IMPLIANCES 9
IMPLIANCES OVERVIEW

12/07/2021 IMPLIANCES 10
The Appliance Model
First and foremost, Impliance is an appliance that prepackages storage,
servers, and software into a turnkey information management system that
is operational “out of the box”.

The necessary software is pre-installed, automatically detecting which


hardware components are available and reconfiguring itself if there are
changes. The pre-installation and pre-configuration of the system
significantly reduces the “time to value” (TTV), that is, the time between the
decision to purchase a system and when its deployment actually realizes
benefit for the enterprise, the top priority of data warehouse managers,
according to a recent Winter Corp.

Another benefit of the appliance model is better integration of different


software components. Such integration first enables better collaboration
among those components.
12/07/2021 IMPLIANCES 11
Uniform Management of All Data
Today’s systems manage different types of data in very different ways, with
very different user interfaces, despite the common requirement to reliably
store, accurately search the content of, and rapidly retrieve both data and
metadata about that content.

Databases have typically been limited to managing highly-structured data with


a common format and relatively small attributes, which conforms nicely to the
tables of relational database systems.

A recent Java content repository standard (JSR 170) [48] allows querying of
metadata. However, all metadata must match a predefined JSR schema; hence
schema chaos (diversity) is not supported, as in a conventional content
management system. Lastly, the ultra-simple “bag of bytes” model of file
systems provides a “repository of last resort” that can manage unstructured as
well as structured data, but without the powerful querying capability (e.g., joins
and aggregations) we take for granted in databases.
12/07/2021 IMPLIANCES 12
Impliance unifies the management of all data under one umbrella, providing
interfaces to search structured and unstructured content and metadata alike.

Much of the previous research in information discovery can be applied here. First,
additional metadata will be extracted for each document by running different
kinds of annotators [40][53]. This will identify not only entities such as person
names and locations, but also relationships among them. Second, using schema
mapping technologies [9][31], structures from different sources can be
consolidated

The second, more powerful query interface supported by Impliance is intended


for building applications that access information through more structured search.
We are still in an early phase of understanding the requirements of such a query
language.

12/07/2021 IMPLIANCES 13
Simple, Massive Parallelism for Query
Processing
A typical Impliance installation will consist of several instances of
Impliance deployed in geographically separated locations for
disaster recovery as well as load balancing

In order for a single instance of Impliance to be able to scale from


a one-terabyte small business to a multiple terabyte enterprise,
the storage and data processing capabilities must be scalable
over three orders of magnitude and a wide variety of workloads.

Each Impliance instance consists of a number of nodes,


topologically differentiated into three flavors, each optimized for
a particular style of computation based on their connectivity, but
each supporting the same execution environment.
12/07/2021 IMPLIANCES 14
12/07/2021 IMPLIANCES 15
The node types correspond to the most popular distributed computing paradigms in use today, but
are novel in their use in tightly-coupled combination.

Data nodes have direct ownership of a subset of the persistent storage, and are the most efficient
when performing operations on that storage. Data nodes are sized to balance their computing
capability and their I/O bandwidth, but they can be a bottleneck if the data stored on a data node is
heavily used.

• Grid nodes perform analytic computations. They may be pulled into a “work crew” to perform
longer short-term operations, and have no long-term state. Grid nodes may offer specialized
computing capabilities, such as a hardware accelerator, and have the lowest cost per cycle.

• Cluster nodes are responsible for making consistent locking and caching decisions on data within
data consistency groups. Such nodes are good at scalable performing many small consistent updates
over a large set of data, but being a part of a consistency group requires overhead for heartbeats and
for reacting to nodes joining or leaving the group.
For example, a query can be parallelized by performing full-text index search on a set of data nodes,
which then send the reduced data to a set of grid nodes for joining, sorting, and group-wise
aggregation, the results of which are sent to a set of cluster nodes to drive a set of updates. For better
resource utilization, each operation could be executed on any of the node types.

12/07/2021 IMPLIANCES 16
Compute and Storage Resource
Virtualization
In order to achieve its scalability goals, enterprise deployments of Impliance will be organized
as potentially thousands of interconnected nodes constructed from commodity hardware
components.
In order to unify and simplify its management, Impliance will virtualize this diverse set of
compute and storage resources by introducing the notion of a resource group: a group of
tightly-coupled nodes(together with their attached storage) that can be assigned the role of
cluster, grid, or data storage service.
The cost-effective autonomic management of these resource groups is a key factor in meeting
goal of reducing both the TCO per byte of data stored, as well as the time-to-value. There are
two aspects to this management:
a. Execution management and
b. storage management.
Execution management is the task of assigning parts of any task to resource groups,
depending on the availability of those groups’ resources. For example, it may make sense to
execute part of a query such as predicate application on the storage nodes, in order to obtain
highest performance and avoid affecting grid nodes.
Storage management is the task of determining how and where to store the system’s data,
including how much to replicate the data for reliability. Some data, especially data users have
added, will require high reliability, and some will require the kind of regulatory protection.
12/07/2021 IMPLIANCES 17
Other Issues

Security and versioning are important to Impliance,

Security: Since Impliance is designed for enterprise information management, it needs to


support policy-driven access controls in such a way that information is provided to the right
people, and only to the right people. Another aspect of security is monitoring and auditing.
Impliance should be able to trace the lineage of a piece of data as well as queries that have
accessed it .

Security is the focus of recent enterprise search offerings from both Oracle and IBM.

Another important issue is versioning. Because of auditing requirements and the


abundance of low-cost storage capacity, Impliance does not update data in-place. Instead,
changes are implemented as the addition of a new version. We are still investigating
whether we should only support a simple sequential versioning primitive and let various
other versioning schemes be built on top of it, or directly support more complex ones,
allowing branching and merging of versions, as in typical source-code management
systems.
12/07/2021 IMPLIANCES 18
RECAP…..
We summarized the trends that will shape information management for the
foreseeable future. Those trends imply three major requirements for Impliance:
(1) to be able to store, manage, and uniformly query and transform all data, not just structured
records;
(2) to be able to scale out as the volume of this data grows; and
(3) to be simple and robust in operation.

We then describe four key ideas that are uniquely combined in Impliance to address
these requirements, namely the ideas of:
(a) integrating software and off-the-shelf hardware into a generic information
appliance;
(b) automatically discovering, organizing, and managing all data – unstructured as
well as structured – in a uniform way;
(c) achieving scale-out by exploiting simple, massive parallel
processing, and
(d) virtualizing compute and storage resources to unify, simplify, and streamline the
management of Impliance.
12/07/2021 IMPLIANCES 19
THANK YOU……
May I have your questions Please ,if any…….

12/07/2021 IMPLIANCES 20

Você também pode gostar