Impliance: A Next Generation Information Management Appliance

Impliance: A Next
Generation Information
Management Appliance
Submitted by:
Mayur Kulshreshtha – 09609159
Mayank Talwar – 09609042
Monika Rathour-- 09609125
Tuesday, December 07, 2021 IMPLIANCES 1
Abstract
 Though database technology has been remarkably
successful in building a large market but its impact
on the broader market of information management
is surprisingly limited.
 We introduce Impliance, a next-generation

information management system consisting of
hardware and software components integrated to
form an easy-to-administer appliance that can store,
retrieve, and analyse all types of structured, semi-
structured, and unstructured information.
12/07/2021 IMPLIANCES 2
Some Questions
Why is most (>80%) of the world’s data still not in databases
Didn’t we “solve” this problem in the 1980s with object-relational systems?
Do you use a database to store your data on your laptop?
Why not? (You are a database bigot, aren’t you?)
Have you ever tried to query (with SQL) a database that:

You didn’t create, and…
Had more than 500 tables?
Just how easy is it to incrementally add DB capacity beyond 1 machine? Can we

do it with 100 machines?
Have “self-managing” databases significantly simplified administration?

Introduction
Though database architectures have endured several orders of
magnitude changes in the relative speeds of hardware over the last
30 years, the many tiers and layers of components in today’s
information management systems are exceedingly complicated.
It is time to re-examine not only the architecture of database

systems as we know them today, but also their place in the overall
stack of software employed in modern data-intensive applications.
This process has resulted in a radical new architecture that integrates

software and hardware into a high-function and easy-to-manage
information management appliance that we call Impliance, currently
being designed and prototyped at the IBM Almaden Research Center.
IBM progressing for SMARTER INTELLIGENCE
Impliances
Impliance is an ambitious, long-term effort to define simpler,
more robust, and more scalable information systems for
tomorrow’s enterprises.
The goal of the Impliance project is to build a next-generation

information management system that stores all structured,
unstructured, and semi-structured data, is easy to manage, and
analyses data in a scalable way.
Impliance will be capable of storing, retrieving, and analysing all

these structured, semi-structured, and unstructured information,
with low total cost of ownership (TCO), and supporting the needs
of small businesses to those of the largest global enterprises.
Requirements For Impliance
Not just structured homogeneous tables – need to be managed uniformly by
tomorrow’s information management systems but structured, semi-structured,
and unstructured information are required to manage huge amounts of
information of various data types (PDF, XML, Text, Audio, and Video).
Customers want systems that can seamlessly and scalably expand as an

enterprise and its systems’ needs grow.
Customers want systems that are easy to install, deploy, use, and manage, and
that minimize the need for human intervention.
Current information management software products are largely based on the

hardware architectures of several decades ago. While those products have shown
remarkable resiliency to order-of-magnitude changes in hardware speeds and
capacities, re-examining the software architecture in light of tomorrow’s
hardware seems overdue.
Use Of Impliances For Problem Solving
Exploiting Customer Relationship Management
Integrating Content and Data
Legal Compliance
Functionality Overview
We have identified four major areas of functionality that span information
management and that Impliance needs to unify:
SEMANTICS:Even tagged data such as XML may not have sufficient semantic
information behind the tags to properly relate it, e.g. the units of measurement, how
that data was collected
SEARCH/QUERY:The more semantic information we can extract from the data, the
more we can improve the utility of this search.
COMPOSITION: Composition includes integrating items from heterogeneous data

“silos” to form higher-level objects, or “quick and dirty” mash-ups that beneficially
merge public web data sources with each other or with enterprise data sources.
AGGERGATION: Aggregation is fundamental to today’s Business Intelligence and On-

Line Analytic Processing (OLAP), data mining, and visualization.
IMPLIANCES OVERVIEW
The Appliance Model
First and foremost, Impliance is an appliance that prepackages storage,
servers, and software into a turnkey information management system that
is operational “out of the box”.
The necessary software is pre-installed, automatically detecting which

hardware components are available and reconfiguring itself if there are
changes. The pre-installation and pre-configuration of the system
significantly reduces the “time to value” (TTV), that is, the time between the
decision to purchase a system and when its deployment actually realizes
benefit for the enterprise, the top priority of data warehouse managers,
according to a recent Winter Corp.
Another benefit of the appliance model is better integration of different

software components. Such integration first enables better collaboration
among those components.
Uniform Management of All Data
Today’s systems manage different types of data in very different ways, with
very different user interfaces, despite the common requirement to reliably
store, accurately search the content of, and rapidly retrieve both data and
metadata about that content.
Databases have typically been limited to managing highly-structured data with

a common format and relatively small attributes, which conforms nicely to the
tables of relational database systems.
A recent Java content repository standard (JSR 170) [48] allows querying of
metadata. However, all metadata must match a predefined JSR schema; hence
schema chaos (diversity) is not supported, as in a conventional content
management system. Lastly, the ultra-simple “bag of bytes” model of file
systems provides a “repository of last resort” that can manage unstructured as
well as structured data, but without the powerful querying capability (e.g., joins
and aggregations) we take for granted in databases.
Impliance unifies the management of all data under one umbrella, providing
interfaces to search structured and unstructured content and metadata alike.
Much of the previous research in information discovery can be applied here. First,
additional metadata will be extracted for each document by running different
kinds of annotators [40][53]. This will identify not only entities such as person
names and locations, but also relationships among them. Second, using schema
mapping technologies [9][31], structures from different sources can be
consolidated
The second, more powerful query interface supported by Impliance is intended

for building applications that access information through more structured search.
We are still in an early phase of understanding the requirements of such a query
language.
Simple, Massive Parallelism for Query
Processing
A typical Impliance installation will consist of several instances of
Impliance deployed in geographically separated locations for
disaster recovery as well as load balancing
In order for a single instance of Impliance to be able to scale from

a one-terabyte small business to a multiple terabyte enterprise,
the storage and data processing capabilities must be scalable
over three orders of magnitude and a wide variety of workloads.
Each Impliance instance consists of a number of nodes,

topologically differentiated into three flavors, each optimized for
a particular style of computation based on their connectivity, but
each supporting the same execution environment.
The node types correspond to the most popular distributed computing paradigms in use today, but
are novel in their use in tightly-coupled combination.
Data nodes have direct ownership of a subset of the persistent storage, and are the most efficient
when performing operations on that storage. Data nodes are sized to balance their computing
capability and their I/O bandwidth, but they can be a bottleneck if the data stored on a data node is
heavily used.
• Grid nodes perform analytic computations. They may be pulled into a “work crew” to perform
longer short-term operations, and have no long-term state. Grid nodes may offer specialized
computing capabilities, such as a hardware accelerator, and have the lowest cost per cycle.
• Cluster nodes are responsible for making consistent locking and caching decisions on data within
data consistency groups. Such nodes are good at scalable performing many small consistent updates
over a large set of data, but being a part of a consistency group requires overhead for heartbeats and
for reacting to nodes joining or leaving the group.
For example, a query can be parallelized by performing full-text index search on a set of data nodes,
which then send the reduced data to a set of grid nodes for joining, sorting, and group-wise
aggregation, the results of which are sent to a set of cluster nodes to drive a set of updates. For better
resource utilization, each operation could be executed on any of the node types.
Compute and Storage Resource
Virtualization
In order to achieve its scalability goals, enterprise deployments of Impliance will be organized
as potentially thousands of interconnected nodes constructed from commodity hardware
components.
In order to unify and simplify its management, Impliance will virtualize this diverse set of
compute and storage resources by introducing the notion of a resource group: a group of
tightly-coupled nodes(together with their attached storage) that can be assigned the role of
cluster, grid, or data storage service.
The cost-effective autonomic management of these resource groups is a key factor in meeting
goal of reducing both the TCO per byte of data stored, as well as the time-to-value. There are
two aspects to this management:
a. Execution management and
b. storage management.
Execution management is the task of assigning parts of any task to resource groups,
depending on the availability of those groups’ resources. For example, it may make sense to
execute part of a query such as predicate application on the storage nodes, in order to obtain
highest performance and avoid affecting grid nodes.
Storage management is the task of determining how and where to store the system’s data,
including how much to replicate the data for reliability. Some data, especially data users have
added, will require high reliability, and some will require the kind of regulatory protection.
Other Issues
Security and versioning are important to Impliance,
Security: Since Impliance is designed for enterprise information management, it needs to

support policy-driven access controls in such a way that information is provided to the right
people, and only to the right people. Another aspect of security is monitoring and auditing.
Impliance should be able to trace the lineage of a piece of data as well as queries that have
accessed it .
Security is the focus of recent enterprise search offerings from both Oracle and IBM.
Another important issue is versioning. Because of auditing requirements and the

abundance of low-cost storage capacity, Impliance does not update data in-place. Instead,
changes are implemented as the addition of a new version. We are still investigating
whether we should only support a simple sequential versioning primitive and let various
other versioning schemes be built on top of it, or directly support more complex ones,
allowing branching and merging of versions, as in typical source-code management
systems.
RECAP…..
We summarized the trends that will shape information management for the
foreseeable future. Those trends imply three major requirements for Impliance:
(1) to be able to store, manage, and uniformly query and transform all data, not just structured
records;
(2) to be able to scale out as the volume of this data grows; and
(3) to be simple and robust in operation.
We then describe four key ideas that are uniquely combined in Impliance to address
these requirements, namely the ideas of:
(a) integrating software and off-the-shelf hardware into a generic information
appliance;
(b) automatically discovering, organizing, and managing all data – unstructured as
well as structured – in a uniform way;
(c) achieving scale-out by exploiting simple, massive parallel
processing, and
(d) virtualizing compute and storage resources to unify, simplify, and streamline the
management of Impliance.
THANK YOU……
May I have your questions Please ,if any…….

Impliance: A Next Generation Information Management Appliance

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Impliance: A Next Generation Information Management Appliance

Enviado por

Direitos autorais:

Formatos disponíveis

Impliance: A Next

 We introduce Impliance, a next-generation

Didn’t we “solve” this problem in the 1980s with object-relational systems?

Do you use a database to store your data on your laptop?

Why not? (You are a database bigot, aren’t you?)

Have you ever tried to query (with SQL) a database that:

Just how easy is it to incrementally add DB capacity beyond 1 machine? Can we

Have “self-managing” databases significantly simplified administration?

It is time to re-examine not only the architecture of database

This process has resulted in a radical new architecture that integrates

The goal of the Impliance project is to build a next-generation

Impliance will be capable of storing, retrieving, and analysing all

Customers want systems that can seamlessly and scalably expand as an

Current information management software products are largely based on the

Exploiting Customer Relationship Management

Integrating Content and Data

COMPOSITION: Composition includes integrating items from heterogeneous data

AGGERGATION: Aggregation is fundamental to today’s Business Intelligence and On-

The necessary software is pre-installed, automatically detecting which

Another benefit of the appliance model is better integration of different

Databases have typically been limited to managing highly-structured data with

The second, more powerful query interface supported by Impliance is intended

In order for a single instance of Impliance to be able to scale from

Each Impliance instance consists of a number of nodes,

Security and versioning are important to Impliance,

Security: Since Impliance is designed for enterprise information management, it needs to

Another important issue is versioning. Because of auditing requirements and the

Você também pode gostar