Você está na página 1de 6

Oracle touts RAC as a solution for multiterabyte data warehouses or DSS (decision support systems) solutions.

Its not the Oracle technology that is in question; its the notion of clustering machines to achieve a very high scale of data warehousing support. The acronym RAC stands for real application clusters, and clustering is a form of linking hardware together such that all the nodes (machines) in the same cluster look and act like one big machine. Unfortunately, this is where organizations are mislead, thinking they can scale clusters by buying many smaller machines, rather than investing up-front in the proper big-iron boxes needed to handle the horsepower of very large data warehouses. Anything at or above 45 terabytes in a single data warehouse is deemed to be a very large data warehouse (VLDW), as long as the loads are large and the data is accessed. Forty-five terabytes of storage with data that is not accessed is not considered a VLDW. To their credit, Oracle has established VLDW systems in place, but most of these are run on single large hardware instances (even in RAC installations), SMPs (symmetrical multiprocessors) or LPAR (logical partitioning) / VPAR (virtual partitioning) on a mainframe. A Basic Look at Competing Architectures Before diving into specifics, lets define some basic terms and basic architectural differences:

SMP (symmetric multiprocessing): single node machines, scale up CPU, high-speed bus, fast bandwidth, not necessary to share disk.

Figure 1: SMP Processing

Clustered: Requires shared disk across all nodes, synchronization of memory and processing must take place across all nodes, high-speed interconnect is required.

Figure 2: Clustered Processing

MPP (massively parallel processing): Usually independent SMP components, capable of scaling out and up (within a single SMP node), doesnt share disk, splits processing into parallel components across the architecture.

Figure 3: Massively Parallel Processing The Pros and Cons Based on this authors experience, these are bound to appear. All of these pros and cons pertain to solutions of 15TB to 45TB or more. Usually, below 45TB not many of these issues become deal breakers.

SMP Processing Pros and Cons Pros Large scalability Extremely fast access Logical Partitioning Single unit

Cons Upper size (processing and data) limit due to hardware bus size. All CPUs and all RAM must be of the same make/speed and size. Number of I/O channels must be kept in concert with CPUs in order to avoid bottlenecking. Amount of RAM should be at least 1.3x the number of CPUs once 32 CPUs are reached (costly). Expensive solution to purchase outright. Doesn't seem to be possible to scale to MPP today (without special hardware). Cons Hot disk spots with large batch loads. Cannot effectively control slices of data via partitioning, making it a challenge to balance data sets across bandwidth. Hot disk spots with large data queries (data that tries to aggregate multi-hundreds of thousands of rows has to synchronize all that data in RAM before aggregating it).

Single point of management Clustering Pros and Cons Pros Mid-tier scalability LARGE clustered SMP machines work similar to large MPP nodes

Single point of management

Cheaper than single SMP or MPP as Once a mixed workload is put on the single cluster and an entry point large data sets are being written and collected, the engine spends more time synchronizing across the network than it does answering the needs of either the

load or the query. Due to increased volumes, increased network traffic means increased CPU utilization. It also means a diminishing law of returns the higher the volume (both query and load), the closer to the top of the curve of performance and responsiveness will be experienced. MPP Pros and Cons Pros Scale-out solution Cons To add another node, it is necessary to repartition and redistribute data sets. Costly to engineer, results in high up-front costs, which appear to be inhibitors to entry.

Doesnt require big-iron nodes

Single point of management Everything is run in parallel, supporting a mixed workload is easy
Blurry Lines Between the Technologies If you think about VLDWs, then you must first be thinking big iron. It doesnt matter if you are thinking SMP, cluster or MPP (SMP nodes hooked together). Big iron (lots of CPU, lots of RAM and high-speed I/O disk) is just the ticket to blur the lines between architectures. Different sized machines often cause problems arising from different sized data sets, which are usable/utilized data sets (either loaded and/or queried). When we look at a VLDW, lets consider todays machine size and speed (as of the writing of this article which of course will be surpassed shortly thereafter). Requirements for MPP nodes are much smaller. SMP node sizes are considerations for big iron running single SMP instances, or running as a part of a clustered environment.

SMP or Node Machine Size Machine Configuration 16 or less CPUs, non-dual core 18GB RAM or less Small 3 dedicated I/O fiber channels or less 8GB or less "RAM cache" on disk units <300MB per second throughput on each I/O channel 16 to 24 CPUs, non-dual core 18 to 48GB RAM Medium 5 to 8 dedicated I/O fiber channels 12 to 24GB RAM cache on disk units 400MB to 2GB per second throughput on each I/O channel

24 to 64 CPUs, dual-core or not 48 to 128GB RAM (or more) Large 12 to 64 dedicated I/O fiber channels 24 to 168GB RAM cache on disk units 2GB per second or better throughput on each I/O channel
When we look at Winter Corporations reports on VLDW, they all specify large style machines used for 15 to 48TB+ of information. In fact, in Winter Corporations Sun Enterprise Systems: A Scalable Platform for Very Large Data Warehousing with Oracle(page 9, paragraph 3), the Sun Fire SMP used is 72 dual-core CPUs, 576GB RAM, 72 hot swap I/O channels. This is a very large SMP machine on which to run Oracle, but its not just Oracle its any relational database engine that needs power in order to handle big data problems. Keep in mind that a VLDW isnt measured in dead data simply sitting in a table somewhere. A VLDW is measured by several aspects, some of which include: amount of data being loaded over the course of any given hour (or batch cycle), amount of data being queried or utilized from within the database, and a mix of the workload (for example: How fast can queries return while loading large amounts of data simultaneously?). Big machines have lots of horsepower, and because of this, synchronization across two large machines in an Oracle RAC environment is not the same as synchronization across four or six small machines made up to look like two big machines. Oracle RAC can scale, theres no doubt about it but scaling it within large scale SMPs is just like putting Oracle on single SMP big-iron boxes. The operator gets the chance to partition, parallelize and synchronize similar to an MPP environment. In other words, the sum of the parts will not necessarily equal the sum of the whole (when comparing small- or medium-sized boxes to singular or dual large environments). What Oracle says: Oracles key innovation is a technology called cache fusion, originally developed for Oracle9i Real Application Clusters. Cache fusion enables nodes on a cluster to synchronize their memory caches efficiently using a highspeed cluster interconnect so that disk I/O is minimized. The key, though, is that cache fusion enables shared access to all the data on disk by all the nodes on the cluster. Data does not need to be partitioned among the nodes. (Source: Oracle Real Application Clusters 10g: The Fourth Generation by Angelo Pruscino, Oracle, and Gordon Smith, Oracle) In other words, cache fusion is a huge success with smaller numbers of large nodes and can be deadly with larger numbers of small nodes due to the increase in network traffic in order to keep all the caches synchronized. If youre not lucky enough to afford large nodes, then you must take into account the next bottleneck: I/O across shared disk. Even the experts agree that RAC with traditional disk is not for the fainthearted. The book, Oracle RAC & Grid Tuning with Solid State Disk: Expert Secrets for High Performance Clustered Grid Computing by Mike Ault and Donald K. Burleson, was the subject of a blog review, and both the book and the review provide more information on the use of SSD (solid state disk) to overcome the performance issues for the RAC solution. Solid state disk (RAM type disk) is usually too expensive a solution, and frequently doesnt scale well when compared with investing in a large scale SMP node for RAC. RAC works best when synchronization doesnt have to happen across many nodes, or is taken care of within a large scale single SMP node. Sun has tremendous success (as does the HP Superdome, and the mainframe with LPARS) as noted in the previously mentioned Winter Corporation report: Key to Suns capability is its large processor count symmetric multiprocessor (SMP) architecture. Suns SMP architecture relies on high bandwidth connectivity with reasonable latency in the few hundreds of nanoseconds. (Winter Corporation, Sun Enterprise Systems: A Scalable Platform for Very Large Data Warehousing with Oracle, Page 9, paragraph 4). Before we comment any further on SMP clustering, lets see what Teradata says about MPP versus SMP:

Why does MPP scale differently than the alternatives? Every other architecture sacrifices scalability for ease of programming. Each additional CPU that has to share memory increases the potential contention for access to a memory location. The hardware has to keep track of which CPU is using which memory locations so that errors and old data do not creep in. The operating system (OS) also has more sharing and resource contention to manage, thus consuming more system resources as it handles more CPUs concurrently. These liabilities cause both SMP and NUMA to each have a diminishing-returns curve: Each CPU added to the configuration delivers less computing power than the previous one until additional CPUs do not provide any more power to the application. This diminishing-returns curve violates both perfect scale-up and speed-up. Neither can ever be delivered by an SMP or NUMA architecture. The MPP architecture eliminates the diminishing-returns curve (with software that knows how to take advantage of it) and is the only platform physically capable of delivering perfect scale-up and speed-up. Each additional node adds system resources that are not shared with those already in the configuration. None of the new resources are used to manage sharing, so they all are applied directly to the application. Now thats not to say that MPP is for everybody. MPP has its issues of balancing data across each of the nodes, and part of the up-front cost includes PhD engineering for parallel algorithms, high-speed interconnect hardware and high speed disk access. My Point It takes more than just a pretty RDBMS software engine to finish the job nicely. Richard Winter drives this point home when discussing Suns total package and what it means to Oracle RAC: The best cluster performance can be obtained by using the scalable coherent interface (SCI) interconnect that provides 200MB per second bandwidth and exploits remote shared memory to insure low-latency (4 milliseconds). In other words, making RAC truly work for any VLDW environment requires hardware incredibly fast and incredibly large hardware in order to reach clustering as a reality. Now, from an MPP standpoint, its pay me now or pay me later because of the law of diminishing returns within both clustered and big-iron SMP architectures. Eventually, the volume of data can (and will) overwhelm the size of the machine, and scaling out (adding nodes) in either one of these solutions (except MPP) will cause bottlenecks to appear, which will only get worse as additional nodes are added. Where has Oracle RAC Succeeded? There is a case study cited in Emerging Trends in Business Intelligence (see slide #15) by Robert Stackowiak, Senior Director of Business Intelligence for Oracle. In this case, the facts are plainly visible:

61TB Query DW on Oracle at Amazon.com

o o

100,000 queries/week (mostly complex ad hoc) Amazon runs 2 identical 61TB+ query DWs loaded concurrently. Configuration for each is: 16 node RAC/Linux cluster

Oracle10gR1 RAC using ASM on Red Hat Enterprise Linux 3 16 HP DL580s, each w/ 4 3-GHz CPUs 71 HP StorageWorks MSA1000 8 32-port Brocade switches 1 Gigabit interconnect

DW Metrics

Each holds 51TB raw data (growing at 2x per year)

Each is 61TB total database size w/ only 2TB indexes 71TB total disk for each

Please notice that there are hyper-threaded fast CPUs, extremely large disk array and super fast interconnect utilized (across a large number of Brocade switches). They have mitigated any network bottleneck due to synchronization because of the fast network and high numbers of parallel interconnects. They have mitigated any disk bottleneck with very large storage array and large RAM contingency. Of course, they have tuned the Oracle installation extensively. Installing and configuring Oracle RAC to perform takes a lot of knowledge and highly trained professionals. Amazons RAC environment is fail-over across an identical network. What About Scalability at a Later Date? Keep in mind that adding nodes to a clustered environment requires: the same exact hardware (usually in order to avoid slowest hardware dictates overall performance problem) and the same exact software release. In order to scale up a clustered environment, all hardware nodes need to be scaled at the same time. Most MPP architectures can be scaled across multiple machines of varying size and shape depending on the vendors OS and RDBMS engines. Conclusions Oracle RAC is not a bad solution; however, it appears that to meet true VLDW needs, it must be housed on an inordinately large SMP set of platforms. The cost of two such large platforms along with the per-CPU licensing that Oracle charges may be an inhibitive barrier to entry. Scaling smaller clustered SMP units seems to be doable (as the Amazon case points out); however, upgrading all units at the same time to go beyond the 70TB mark may prove to be another cost nightmare, not to mention that the migration path to get there may become a technical mud-bog. Oracle (non-RAC) continues to make huge strides on big-iron SMP machines, as do Oracles competitors on MPP-based hardware (IBM, Teradata, Netezza, DATAllegro, etc.). Your corporations choice of architecture and cost should be weighed carefully against how large your environment will grow in the next 3 to 5 years. Then, the vendor (no matter which one) should sign a performance-based SLA where they agree to match the performance required from the hardware or architecture platform selected. In case youre wondering, my recommendation has been to select an MPP-based platform for limitless scale out because it allows (if necessary, but not desired) multi-versioned, multi-configured SMP hardware underneath.

Você também pode gostar