Você está na página 1de 34

Linux High Availability Cluster Selection

Tim Burke

tburke@redhat.com

6/27/2017
6/27/2017
Selection Process / Presentation Outline
Identify target applications - usage model
Identify required cluster feature set
Open source vs proprietary, product vs
project
Cost factors
Vendor evaluation
OEM & ISV endorsements

6/27/2017
Identify Target Applications
Clustering Categories
High Availibility Clusters
Database
Fileservers
Off the shelf applications
Load Balancing Clusters
Dispatching web traffic
High Performance Computing
Large computational problems

6/27/2017
High Performance Computing
HPC, HPTC cluster attributes
1. Large# of systems working together to
solve a common problem -scalability
2. Performance, not reliability is of utmost
importance
3. Requires custom parallelized
applications
4. Tendsto be bleeding edge, early
adopters
5. Example deployments: genetics,
pharmacutical, weather, seismic
6/27/2017
analysis, modeling
Load Balancing Clusters
Front end dispatching node (or 2 for
redundancy)
Pool of inexpensive back end servers
Redirect transactions so no 1 system is
overloaded
Balancing algorithms: round robin,
weighted, load based
Typically used for web server traffic
(Apache front end)
Useful for static content
Not applicable for dynamic content
6/27/2017
High Availability Clusters
The need for high availability (HA)
Overview of high availability features

6/27/2017
Reliability, Availability, Serviceability
(RAS)
Users & businesses have high expectations
1. Reliability - high degree of protection for
corporate data. Information is a crucial business
asset.
2. Availability - near continuous data access
3. Serviceability - procedures to correct problems
with minimal business impact

6/27/2017
Sources of Downtime
The Standish Group - 2001
Application bug or Other
error
Main-system
hardware failure
Database error
Main-server system
bug
Network
Operator error
Other server's
hardware failure
Other server's sys -
tem bug
Environmental condi -
tions
Planned outage

6/27/2017
Downtime Costs -The Standish Group
Cost per minute of downtime (dollars)
13000
12000
11000
10000
9000
8000
7000
6000 Column 2
5000
4000
3000
2000
1000
0
Electronic Supply E- Internet Customer Messaging
resource chain com- banking service
planning man- center
(ERP) agement

6/27/2017
No Single Point of Failure (NSPF)
Hardware Redundancy - increased overall
reliability and availability
1. Multiple paths between systems
2. Storage - mirrored, RAID5
3. Multiple power sources
4. Multiple external networks

6/27/2017
High Availability Clusters
Redundancy for fault
tolerance
Failover - if 1 node shuts
down or fails, another
node takes over
application load
Facilitates planned
maintenance

6/27/2017
Failover
Involves selecting a target node & moving
resources - failover policies
Example resource types
1. Physical disk ownership
2. Filesystems

3. Applications

4. Databases

5. IP addresses

6/27/2017
Failover Configurations
Active / Passive
1 node runs application(s)
Other node on standby for takeover
Idle node can takeover with no performance
degradation
Active / Active
All nodes actively running application(s)
Workload moves to survivor on failure
Effectively utilizes capacity (TCO)
6/27/2017
Data Integrity Provisions
Crucial for safe failover of data centric services (filesystem /
database)
In failure scenarios (eg hung node), ensure failed node can
not access storage - I/O Barriers, I/O Fencing
Lack of I/O Fencing can result in
Loss of data (backups ?)
System crashes
Common mechanisms
Power switches
SCSI reservations
Watchdog timers
6/27/2017
Application Monitoring
All HA clusters monitor node state
Most monitor key cluster resources - network,

disk
Many monitor application health
Process existence
Application check scripts
HTTP get on web server
Record retrieval on database
Filesystem directory listing

6/27/2017
Failover Times
Don't get too hung up on this
Remember that data integrity is paramount
Quoted failover times only include cluster overhead, don't
include application recovery
Application startup time
Filesystem consistency checks
Database recovery - transaction replay
Example
Product literature cites 5 second failover time
Can be several minutes for database recovery (size &
activity dependent)

6/27/2017
Open Source vs Proprietary
Project vs Product
Open source facilitates self-support &
customization
Support is a key determinant
Products are generally well tested
Some products are also open source
If you care enough about high availability &
solution stacks, you're likely to go the product
route

6/27/2017
Heterogeneous HA Products
Proprietary offerings that run on Linux, W2K,
UNIX
Unifies user training
May compromise flexibility, adaptability or
data integrity (ouch!)
Some are Linux products with GUIs that run
on other platforms
Virtually none allow heterogeneous platforms
within the same cluster
6/27/2017
Cost Factors
Beware of hidden charges
Product base fee
Application specific charges (Oracle, DB2, NFS,
etc)
Support
Some only come with bundled service offerings
Hardware requirements
Proprietary UNIX offerings typically cost

several times more


6/27/2017
Vendor Evaluation
Company vision - do their cluster offerings complement or
distract. Futures roadmap.
Financial Stability
Ability to impact the marketplace
Responsiveness - ability to provide ongoing feature
enhancements
Proprietary vs open source
Product integration - fit with distribution, kernel patches,
compatibility & support implications
New Linux technology vs large monolithic legacy ports
How long its been on the market
6/27/2017
Open Source Projects
FailSafe - from SGI & SuSE

Optional data integrity provisions (power switch)


Supports 16 nodes

Good set of application kits


Red Hat Cluster Manager

Also offered as a product


Described later in presentation

6/27/2017
HA Cluster Product Comparisons
The ground rules
Trying to remain objective
Highlight product strengths
Listed in alphabetical order
Based on web site content as of 10/2002

6/27/2017
HP - MC/Serviceguard
Proprietary - Ported from HP/UX
Only supported on HP hardware
Dynamic online addition/removal of members
Worldwide support services
Quorum voting membership
Up to 8 nodes using FibreChannel storage, 2
nodes using SCSI
Compaq Alpha line targeted at HPC clusters
6/27/2017
Legato - Availability Manager
Proprietary
Heterogeneous (Linux, W2K, Solaris, HP-UX)
Strong data centric services
Well integrated with SAN environments
Replication
Storage management, volume management,
backup
Application monitoring
Extensive set of application specific modules
6/27/2017
PolyServe - Application Manager
Proprietary
Application monitoring
Up to 16 nodes
Multiple platforms - Linux, W2K, Solaris
Doesn't require shared storage
Dynamic member addition/removal
Centralized management

6/27/2017
PolyServe - Matrix Server
Tailored for Oracle 9i Real Application
Clusters
Concurrent read + write access to data on
shared storage SAN
Cluster filesystem with lock manager +
distributed cache
Allows incremental growth by adding servers
+ storage
Proprietary
6/27/2017
Red Hat - Cluster Manager
Bundled with RHL Advanced Server 2.1
Both open source & product

Data integrity provisions

Power switches (optional)


Watchdog timer software
Application monitoring
Heterogeneous fileserving via NFS + Samba

Web monitoring GUI

Also integrated Piranha load balancing cluster

6/27/2017
Steeleye - LifeKeeper
Proprietary - UNIX port
Multi-platform - Linux, W2K
Wide set of application kits (separately
purchaced)
Established OEM relationships
Data integrity provisions - via SCSI
reservations, requiring kernel patches
Application monitoring

6/27/2017
IBM
Focusing on HPC
Rackmounted Intel servers
Custom solutions
(older) XCAT software for management, parallel
operations, and installation
(newer) Cluster Systems Mgt (CSM) for Linux
Remote monitoring, resets, bios console
Parallel shell
Requires IBM hardware for imbedded service processor
High Availability via partnering

6/27/2017
Veritas Cluster Server
Recent Linux port
16 nodes, wide range of supported apps

Also runs on Windows, AIX, UNIX, Solaris

Integrates with their storage offerings (volume

management, backup, data replication)


Proprietary

6/27/2017
Other Vendors
Dell
Strategic partnering for HA software
Penguin Computing
HPC offering via partnership with Scyld Beowulf

6/27/2017
Consolidated Solutions
Egenera
BladeFrame hardware, backplane eliminates
cabling
Management software, HA, provisioning
Linux NetworX
Turnkey solution, preintegrated hardware + management
tools
Custom hardware, dense racks

6/27/2017
Summary
Know what category of cluster is right for you
Be knowledgeable of required cluster
features
Weigh your cost criteria
Chose a vendor you can trust to safeguard
your corporate assets
Be wary of marketing collateral

6/27/2017

Você também pode gostar