Escolar Documentos
Profissional Documentos
Cultura Documentos
Infrastructure
Elements that enable the architecture to be implemented.
Operational
help to keep the DW going
People Procedures Training Management software
Physical
Hardware components Operating system Network, network software
Physical Infrastructure
OS
NT Servers
Medium-sized data warehouses Limited parallel processing Cost effective for small or medium DW
Platform Options
A computing platform is the set hardware components, operating system, network & network software. Both Online Transaction Processing and Decision Support Systems need a computing platform.
Manual Methods
External medium
Practically feasible solution is a minimum configuration on an appropriate platform that would support a standard set of information delivery tools in DW
Parallel processing
Symmetric multiprocessing Clusters Massively parallel processing Cache-coherent Non uniform Memory Architecture
Symmetric Multiprocessing
Symmetric Multiprocessing
Features: This is a shared-everything architecture, the simplest parallel processing machine. Each processor has full access to the shared memory through a common bus. Communication between processors occurs through common memory. Benefits: Provides high concurrency. You can run many concurrent queries. Balances workload very well. Gives scalable performance. Simply add more processors to the system bus. Being a simple design, you can administer the server easily.
Symmetric Multiprocessing
Limitations: Available memory may be limited. May be limited by bandwidth for processorto-processor communication, I/O, and bus communication. Availability is limited; like a single computer with many processors.
Clusters
Clusters
Features: Each node consists of one or more processors and associated memory. Memory is not shared among the nodes; it is shared only within each node. Communication occurs over a high-speed bus. Each node has access to the common set of disks. This architecture is a cluster of nodes. Benefits: This architecture provides high availability; all data is accessible even if one node fails. Preserves the concept of one database. This option is good for incremental growth.
Clusters
Limitations: Bandwidth of the bus could limit the scalability of the system. This option comes with a high operating system overhead. Each node has a data cache; the architecture needs to maintain cache consistency for internode synchronization. Main memory is like a big file cabinet stretching across the entire room.
NUMA
NUMA
Features: This is the newest architecture. The NUMA architecture is like a big SMP broken into smaller SMPs that are easier to build. Hardware considers all memory units as one giant memory. The system has a single real memory address space over the entire machine; memory addresses begin with 1 on the first node and continue on the following nodes. Each node contains a directory of memory addresses within that node. In this architecture, the amount of time needed to retrieve a memory value varies because the first node may need the value that resides in the memory of the third node. That is why this architecture is called non uniform memory access architecture. Benefits: Provides maximum flexibility. Overcomes the memory limitations of SMP. Better scalability than SMP.
NUMA
Limitations: Programming NUMA architecture is more complex than even with MPP. Software support for NUMA is fairly limited. Technology is still maturing.
Database Software
Many operations can be parallelized
mass loading of data full table scans queries with exclusion conditions, queries with grouping selection with distinct values aggregation sorting creation of tables using subqueries, creating and rebuilding indexes inserting rows into a table from other tables
Types of parallelization
Software Tools
Summing up
Infrastructure acts as the foundation supporting the data warehouse architecture Data warehouse infrastructure consists of operational infrastructure and physical infrastructure. Hardware and operating systems make up the computing environment for the DW. Several options exist for the computing platforms needed to implement the various architectural components.
Summing up
Selecting the server hardware is a key decision. Invariably, the choice is one of the four parallel server architectures. Current database software products are able to perform interquery and intraquery parallelization. Software tools are used in the data warehouse for data modeling, data extraction, data transformation, data loading, data quality assurance, queries and reports, and online analytical processing (OLAP). Tools are also used as middleware, alert systems,and for data warehouse administration.