Você está na página 1de 5

Thinking Inside the Box

Four Fundamental Differences Between T… 4 August 2010 13:36:27…

Today Netezza is launching a new eBook entitled, “Oracle Exadata and Netezza TwinFin™ Compared”. As the
name implies, this eBook provides a comparison of the Netezza TwinFin data warehouse appliance and Oracleʼs

“appliance-like” database machine offering.

Certainly Netezza is not the first company to compare/contrast its flagship system with Oracleʼs most recent
entry. Richard Burns, a consultant over at Teradata did a laudable job exposing the technical shortcomings of the
Exadata v2 machine as they pertain to data warehousing in a May 2010 whitepaper. And there have been several
recent pieces written on Oracleʼs apparent success although the publicly named customer-list has struck some as
a bit underwhelming.

Netezza continues to compete (and win) against Oracle regularly in the marketplace, including in competition with
the Exadata v2 product and so, we felt it was high time to put our own comparison story together with todayʼs
eBook and with this little blog posting. Let me know what you think.

So where to begin? Letʼs start with the fact that the Netezza TwinFin is built to excel at a specific purpose – as
the best price/performance platform for Data Warehousing and Analytics in the market. Conversely, Oracle has
tried to “kill two birds with one stone” in the Exadata v2 – aiming it primarily at the On-Line Transaction
Processing applications space, but also making bold claims to performance as a Data Warehouse with itʼs Sun-
based Oracle Database Machine (DBM) and Exadata Storage Server, version 2 (Exadata).

So why does it matter that Oracle is aiming to do both OLTP and DW in the same system – apart, that is, from at
least two decades of people trying-and-failing to do exactly that with the likes of Oracle in previous software and
hardware instantiations? Letʼs start with the workload requirements of the two application areas:

OLTP systems execute many short transactions, typically of extremely small scope (touching only a
handful of records) and in extremely predictable, well-understood access and query patterns. They need
to excel at handling these small transactions in very high volume, combined with equally small writes to
the database in the form of updates, insertions and deletions. This limited scope, high throughput and
“regularity” of the access patterns make OLTP systems great candidates for intelligent caching and
(multiple) secondary data structures, such as indices to speed their processing.

Conversely, DW systems are typically asked to perform “read-heavy” queries and operations against the
current and deep historical data sets. Rather than analyzing just a few records, a DW query might look at
millions, even billions, of rows from a single table, combined with join logic with multiple other tables. Data
warehouse systems are used by company analysts and managers to find the “needle in the haystack” in
guiding enterprise decision-making in a more comprehensive and often ad-hoc manner – frequently
mitigating the ability to use “tricks of the trade” such as results caching and/or indices.

So the two applications tend to lead to very different system/platform implications. No special “news” there – as I
said earlier, people have been trying-and-failing to use a single system for both applications for years.

Without stealing any more of the thunder of our electronic publication today, let me just lay out what I believe are
the fundamental differences between Netezzaʼs TwinFin and the Oracle Database Machine/Exadata as simply
and plainly as I can:

Netezza TwinFin Oracle Database Machine / Exadata v2


True MPP Hybrid "SMP-plus" Approach
Data Streaming with a Hardware Assist CPU-intensive Processing for Basic DB Operations
Deep Analytics Processing Central Cluster-based Approach
No-Tuning-Required Simplicity Complex Array of Knobs and Levers

In my view, these are "big deal" differences. They're not the result of a simple feature gap to be closed in an
upcoming point-release, but rather go directly to limitations at the heart of the Oracle DBM/Exadata system
architecture and/or business culture. To address them would require a major rearchitecting, or at least
refactoring, of Oracle's decades-old DBMS code base. They also happen to be highly visible to customers and
prospects, which makes for some interesting comparisons in head-to-head on-site Proofs of Concept (POCs).

1) True MPP vs. a Hybrid "SMP-plus" Approach

Netezzaʼs TwinFin uses a full MPP approach to data warehousing, pushing all of the processing down as close
as possible to where the data is stored and maximizing the processing horsepower of MPP for scalability,
throughput and performance – for even the most complex workloads. Using the MPP method of dividing the
workload and attacking query problems in parallel, Netezza has been able to demonstrate market-leading data
warehouse price-performance across four generations of data warehouse appliances.

Oracleʼs DBM/Exadata takes a hybrid approach adding Exadata Storage nodes largely to handle data
decompression and predicate filtering tasks, but still relying primarily on the SMP cluster of Oracle RAC to handle
most of the data warehouse tasks, including complex joins. In addition the SMP cluster also must act as the
central distribution point for any data that needs to be redistributed between and across Exadata nodes. To try to
minimize this, Oracle and Sunʼs solution was to “throw hardware at the problem” (quoting Teradataʼs Mr. Burns),
over-engineering interconnections, processor rates and other elements required because of all of this data
movement, rather than refactoring and solving a fundamental software architecture issue.

The difference between the two is akin to an 8-lane continuous streaming superhighway in the TwinFin instance
versus multiple freeways converging on and necking down to a two-lane country road via a “traffic roundabout”. I
live in Massachusetts and can attest to the negative impact of taking multiple highways down to a single road – it
happens every weekend at the gateway to and from Route 6 on Cape Cod.

2) Data Streaming with a Hardware Assist vs. CPU-intensive Work for Basic DB Operations

In addition to the advantages of the MPP architecture for data warehousing, the TwinFin system makes use of
hardware acceleration for increased query and analytics performance. Coming in the form of the "DB Accelerator"
that is part of each S-Blade in the TwinFin system architecture, providing four dual-core Field-Programmable
Gate Arrays (FPGAs) on each DB Accelerator, this hardware acceleration takes care of fundamental processing
steps such as decompression, predicate filtering and ACID-compliant data visibility at the full scan rate of the data
from disk. The fact that this device is placed as close as it is to the disks for which it is performing its processing
gives the TwinFin system much more performance leverage because data can be filtered, processed and value-
added before undergoing any unnecessary CPU processing or having to be transported across an expensive
network.

And the fact that it is a field programmable device means that Netezza can use it to introduce additional features
and performance through a simple upgrade to our NPS software/firmware – as Netezza has with the introduction
of two phases of hybrid column/row-level compression technology (with Release 6.0, scaling as high as 32:1
compression, depending on data patterns) first introduced in 2005, and our high-performance implementation of
row-level security. Because it's performed in the FPGA in TwinFin, "Compression = Performance"; so if a
customer's data is compressed by a 4:1 factor, the effective data streaming rate for processing queries is
increased four-fold.

Conversely, the DBM/Exadata system relies entirely on CPU processing. In fact, the great majority of the
functionality provided for by the Exadata nodes in the DBM/Exadata system is to replicate the functionality
included in each FPGA core of the TwinFin - data decompression and predicate filtering. Because of the CPU-
intensive nature of decompressing data in the DBM/Exadata system, Oracle "strongly suggests" lesser
compression when data is required for high-performance data warehousing vs. "cooler" queryable archive
purposes. Again, the heavy-lifting for query processing and analytics is left to the central SMP cluster nodes
rather than parallel Exadata nodes, forcing Oracle to "throw hardware at the problem".

3) Deep Analytics Processing vs. Central Cluster Analytics

Netezza brings analytics to where the data is stored – as close as possible to where it is stored to do the
processing – not just to decompress it and do predicate filtering, but to complete as much of the complex
analytics as is possible, in parallel. Thatʼs as true of the “traditional” OLAP analytics of SQL-based data
warehousing as it is of the advanced and predictive analytics enabled by the new capabilities of i-Class in the
“Second Wave of TwinFin”.

With i-Class, Netezza introduces a comprehensive, scalable and high-performance approach to advanced
analytics for both our customers and partners, spanning Linear Algebra/Matrix manipulation, and engines for R
and Hadoop along with several programming languages including C, C++, Java, Python and even Fortran. The i-
Class functionality also offers plug-ins and packages for the Eclipse IDE and R GUI, and pre-built, analytic
functions engineered to deliver performance at scale spanning data preparation, mining, predictive analytics and
spatial functions together with access to analytics functions from the GNU Scientific Library and R CRAN
repository. Extended by the i-Class embedded analytics capabilities, TwinFin allows our partners and customers
to push-down applications, functions and algorithms going well beyond standard set-based SQL, at scale with
high performance, freeing them of the latency and sampling requirements demanded by off-board processing
platforms for advanced analytics.
The Oracle DBM/Exadata performs the majority of the OLAP analytics in the central cluster (RAC) nodes, after
traversing the "traffic roundabout". And apart from basic scoring functionality, virtually ALL of the advanced
analytics are performed in the cluster nodes as well. Placing the predominance of processing in the central SMP
cluster means that both the functionality and scale of the analytics are limited by the capacity and performance
that the SMP cluster can provide - typically limited to the elements included in Oracle's own "Data Mining"
package.

The DBM/Exadataʼs requirement for shipping the data from the storage arrays to the central cluster for analytics
is akin to backhauling full massive truckloads of materials from a mining site to pick out the gold at a central
headquarters rather than sifting out the most important nuggets in parallel and sending only those valuable
elements back in the case if TwinFin.

4) No-Tuning-Required Simplicity vs. a Complex Array of Knobs and Levers

For a long time, the simplicity of the Netezza data warehouse appliance has shone through most strongly in the
extremely limited tuning requirements it imposes on administrators of the system, particularly as compared to
Oracle-based systems. Simplifying the system management is core to Netezzaʼs “appliantization” of the data
warehouse and analytics platform. Rather than managing a “coordinated collection” of technology assets, the
system and database administrators of TwinFin interact with a single appliance and use the redundant Linux-
based SMP host nodes as the interaction point for all activities. Everything from database configuration, data
distribution, data mirroring, monitoring, software upgrade and day-to-day management are simplified (in the
words of one TwinFin customer, “Itʼs Netezza-easy – it just works.”).

No indexing is necessary (or even supported) in TwinFin to achieve high performance. Just about the only
requisite “tuning” of the system is the definition of the distribution key for spreading data across all the S-Blades –
typically the primary keys of the tables. Even in the internal management structure of TwinFin, our system
management has been configured to get the maximum performance from the commodity subsystems (blades,
chassis, disk arrays and network) by connecting them in novel ways and then managing them at a system level,
rather than at the subsystem or rack-level.

While it is true that Oracle has simplified some of the tuning knobs and levers in the DBM/Exadata, prospective
customers should ask them if they really have moved into the domain of requiring only a small handful of tuning
knobs & settings; or whether they still require, or more colloquially, “strongly suggest” the use of dozens or even
hundreds of settings (depending upon the number of objects being maintained and optimized). How many dozens
of IP addresses are needed to configure and manage the DBM/Exadata (TwinFin requires only two)? Oracle
even have a special service to help DBM/Exadata customers migrate and tune their systems and databases for
performance and some of their leading Performance Architects even talk about the requirement of using functions
like the Oracle SQL Tuning Advisor as an inevitable fait accompli.

By Oracleʼs own admission, the time-savings that customers can expect to achieve in managing and tuning the
DBM/Exadata system in Oracle 11g r2 is only 26% less than in Oracle 11g. Contrast that with installation after
installation of Netezza appliances where 100s of terabytes of data under management in a data warehouse(s)
are being maintained by two or even less then one FTE, rather than a team of Oracle specialists. It all depends
on oneʼs perspective and philosophy in building a real appliance for the data warehouse market. Where others
may see the need to tune, partition, index and sub-index data sets for performance purposes as an inevitability,
Netezza sees that same need as reason to enhance TwinFinʼs capabilities in order to obviate it.

All of this really adds up quickly to a significant price-performance advantage for customers of TwinFin – and with
our limited tuning and simplified operations, also translates into much more rapid time-to-value for Netezzaʼs
customers, too. So thatʼs it – four simple fundamental differences that really set the TwinFin appliance apart from
the DBM/Exadata. Agree? Disagree? Let me know what youʼre thinking. And now, go over and have a look at
todayʼs eBook release for the rest of the story.

Read more…

Você também pode gostar