Você está na página 1de 62

VERITAS Cluster Server for UNIX, Fundamentals

Topic 1: Cluster Terminology

Definition of a Cluster
A cluster is a collection of multiple independent systems working together under a management framework for increased service availability.

Application Node Storage Cluster Interconnect

Definition of VERITAS Cluster Server and Failover


VCS detects faults and performs automated failover.

Application Node Failed Node Storage Cluster Interconnect

Definition of a Service Group


A service group is a virtual container that enables VCS to manage an application service as a unit. All components required to provide the service, and the relationships between these components, are defined within the service group. Service groups have attributes that define its behavior, such as where it can start and run. Analogous to an MC/SG package

Service Group Types


Failover:
The service group can be online on only one cluster system at a time. VCS migrates the service group at the administrators request and in response to faults.

Parallel
The service group can be online on multiple cluster systems simultaneously. An example is Oracle Real Application Clusters (RAC).

Service Group Dependencies


You can use service group dependencies to specify most application relationships according to these four criteria:
Category: Online or offline

Location: Local, remote, or global


Startup behavior: Parent or child Failover behavior: Soft, firm, or hard

You can specify combinations of these characteristics to determine how dependencies affect service group behavior, as shown in a series of examples in this lesson.

Online on the Same System


Example criteria: App1 uses shared memory to communicate with DB1. Both must be online on the same system to provide the service. DB1 must come online first. If either faults (or the system), they must fail over to the same system.

App1

DB1

Online Anywhere in the Cluster


Example criteria: App2 communicates with DB2 using TCP/IP. Both must be online to provide the service. They do not have to be online on the same system. DB2 must be running before App2 starts.

App2

DB2

Online on Different Systems


Example criteria: The Web server requires DB3 to be online first. Web Both must be online to provide the service. The Web and DB3 cannot run on the same system, due to system usage constraints. If Web faults, DB3 should continue to run.

DB3

Offline on the Same System


Example criteria: One node is used for a test version of the service. Test and Prod cannot be online on the same system. Prod always has priority. Test should be shut down if Prod faults and needs to fail over to that system.

Test

Prod

Definition of a Resource
Resources are VCS objects that correspond to the hardware or software components of an application service.
Each resource must have a unique name throughout the cluster. Choosing names that reflect the service group name makes it easy to identify all the resources in that group, for example, WebIP in the WebSG group. Resources are always contained within service groups. Resource categories include: Persistent

None (NIC) On-only (NFS)


Nonpersistent

On-off (Mount)

Resource Dependencies
Resources in a service group have a defined dependency relationship, which determines the online and offline order of the resource.
A parent resource depends on a child resource. There is no limit to the number of parent and child resources. Persistent resources, such as NIC, cannot be parent resources. Dependencies cannot be cyclical.
Parent

Parent/child

Child

Resource Attributes
Resource attributes define an individual resource. The attribute values are used by VCS to manage the resource. Resources can have required and optional attributes, as specified by the resource type definition.
WebMount resource

Solaris

mount F vxfs /dev/vx/dsk/WebDG/WebVol /Web

Resource Types
Resources are classified by type. The resource type specifies the attributes needed to define a resource of that type. For example, a Mount resource has different properties than an IP resource.
mount [-F FSType] [options] block_device mount_point

Solaris

Agents: How VCS Controls Resources


Each resource type has a corresponding agent process that manages all resources of that type.
Agents have one or more entry points that perform a set of actions on resources. Each system runs one agent for each active resource type.
10.1.2.3 eri0 WebDG WebVol logVol

/web /log online

offline
monitor IP Mount Disk Group Volume clean

NIC

Topic 2: Cluster Communication

Cluster Communication
A cluster interconnect provides a communication channel between cluster nodes.
The cluster interconnect serves to:
Determine which systems are members of the cluster using a heartbeat mechanism. Maintain a single view of the status of the cluster configuration on all systems in the cluster membership.

Low-Latency Transport (LLT)


LLT:
Is responsible for sending heartbeat messages Transports cluster communication traffic to every active system Balances traffic load across multiple network links Maintains the communication link state Is a nonroutable protocol Runs on an Ethernet network

LLT

LLT

Group Membership Services/Atomic Broadcast (GAB)


GAB: Performs two functions:
Manages cluster membership; referred to as GAB membership Sends and receives atomic broadcasts of configuration information

GAB

GAB LLT

LLT

Is a proprietary broadcast protocol Uses LLT as its transport mechanism

The High Availability Daemon (HAD)


The VCS engine, the high availability daemon:
Fence GAB LLT

HAD
hashadow

Runs on each system in the cluster Maintains configuration and state information for all cluster resources Manages all agents

The hashadow daemon monitors HAD.

VCS Architecture
Agents monitor resources on each system and provide status to HAD on the local system. HAD on each system sends status information to GAB. GAB broadcasts configuration information to all cluster members. LLT transports all cluster communications to all cluster nodes. HAD on each node takes corrective action, such as failover, when necessary.

Comparing VCS Communication Protocols and TCP/IP

HAD
hashadow

User Processes

iPlanet

GAB Kernel Processes


LLT NIC Hardware

TCP IP

NIC

Maintaining the Cluster Configuration


HAD maintains a replica of the cluster configuration in memory on each system. Changes to the configuration are broadcast to HAD on all systems simultaneously by way of GAB using LLT. The configuration is preserved on disk in the main.cf file.

main.cf

VCS Configuration Files

main.cf

include "types.cf" cluster vcs ( UserNames = { admin = ElmElgLimHmmKumGlj } Administrators = { admin } CounterInterval = 5 A simple text file is used to ) store the cluster configuration system S1 ( on disk. ) The file contents are described system S2 ( in detail later in the course. ) group WebSG ( SystemList = { S1 = 0, S2 = 1 } ) Mount WebMount ( MountPoint = "/web" BlockDevice = "/dev/vx/dsk/WebDG/WebVol" FSType = vxfs FsckOpt = "-y" )

Ensuring Data Integrity


For VCS 4.x, VERITAS recommends using I/O fencing to protect data on shared storage, which supports SCSI-3 persistent reservation (PR). For environments that do not have SCSI-3 PR support, VCS supports additional protection mechanisms for membership arbitration:
Redundant communication links Separate heartbeat infrastructures Jeopardy cluster membership Autodisabling service groups

System Failure Example


A C B C

S1

S2

S3

S3 faults; C started on S1 or S2 Regular Membership: S1, S2

No Membership: S3

Single LLT Link Remaining


A B C

S1

S2

S3

Regular Membership: S1, S2, S3

Jeopardy Membership: S3

Transition from Jeopardy to Network Partition


3 A, B autodisabled for S3 A B 3 C autodisabled for S1, S2 C

1 S1 S3

S2

1 Jeopardy membership: S3 Mini-cluster with regular membership: S1, S2 Mini-cluster with regular 2 membership: S3 No Jeopardy membership 3 SGs autodisabled

Recovering from a Network Partition


A 4 A, B autoenabled for S3 B 4 C autoenabled for S1, S2 C

1 2 S1

S2

S3 3

1 Stop HAD on S3.

Mini-cluster with S1, S2 continues to run.

2 Fix LLT links.

Start HAD on S3. 3 A, B, C are autoenabled by HAD.

Potential Split Brain Condition


A 2 1 S1 C B A B C

S2

S3

1 2

S1 and S2 think S3 is faulted. No jeopardy occurs, so no SGs are autodisabled. If all systems are in all SGs SystemList, VCS tries to bring them online on a failover target.

S3 thinks S1 and S2 are faulted.

Interconnect Failures with a Low-Priority Public Link


A B C

S1

S2

S3

1
2

No change in membership Regular Membership: S1, S2, S3 2

Jeopardy Membership: S3 Public now used for heartbeat and status.

Simultaneous Interconnect Failure with a Low Priority Link


A B C

S1

S2

S3

Regular Membership: S1, S2, S3

Jeopardy Membership: S3 Public now used for heartbeat and status.

Interconnect Failure with Service Group Heartbeats


C faults on S1 or S2. C A 2 A

A faults on S3. C

S1

1
S2 S3

1 2

Network partition Regular membership: S1, S2 SGHB resource faults during online. Disk Regular membership: S3

Preexisting Network Partition


A C B C

2 1 S3 3 S2 1 2 3 S3 faults; C started on S1 or S2 Regular Membership: S1, S2 LLT links to S3 disconnected S3 reboots; S3 cannot start HAD because GAB on S3 can only see one member No membership: S3

S1

Topic 3: Supported Failover Configurations

Active/Passive

Before Failover

After Failover

Active/Passive N-to-1

Before Failover

After Failover

Active/Passive N + 1

After Failover

Before Failover

After Repair

Active/Active

Before Failover

After Failover

N-to-N

Before Failover

After Failover

Automatic Startup Policies


The AutoStartPolicy attribute specifies how a target system is selected:
Order: The first available system according to the order in the AutoStartList is selected (default). Priority: The system with the lowest priority number in the SystemList is selected. Load: The system with the greatest available capacity is selected.

Example configuration:
hagrp modify groupname AutoStartPolicy Load

Detailed examples are provided on the next set of pages.

AutoStartPolicy=Order
The first available system in AutoStartList is selected.

Animation

AutoStartPolicy=Priority
The lowest numbered system in SystemList is selected.

Animation

AutoStartPolicy=Load
The system with the greatest AvailableCapacity is selected.

Animation

Failover Policies
The FailOverPolicy attribute specifies how a target system is selected: Priority: The system with the lowest priority number in the list is selected (default). RoundRobin: The system with the least number of active service groups is selected. Load: The system with greatest available capacity is selected. Example configuration:
hagrp modify groupname FailOverPolicy Load

Detailed examples are provided on the next set of pages.

FailOverPolicy=Priority
The lowest numbered system in SystemList is selected.

Animation

FailOverPolicy=RoundRobin
The system with the fewest running service groups is selected.

Animation

FailOverPolicy=Load

The system with the greatest AvailableCapacity is selected.

Animation

Limits and Prerequisites

Combining Capacity and Limits

Modeling Workload Management


You can use the Simulator to create and test workload management scenarios before deploying the configuration in a running cluster. For example: Copy the real main.cf file into the Simulator directory. Set up the workload management configuration. Test all startup and failover scenarios. Copy the Simulator main.cf file back to the cluster config directory. Restart the cluster using the new configuration.

Topic 4: VCS Operations

VCS Management Tools


VCS Simulator Java GUI Web GUI CLI Commandline interface Runs on the local system

Create, model, Graphical user Graphical and test interface interface configurations Runs on UNIX Runs on Cannot be systems with and Windows used to supported systems Web browsers manage a running cluster configuration

Only authorized VCS user accounts have access to VCS administrative interfaces.

VCS Cluster GUI

UI support for Global Cluster Option


Both Web Console and Java console provide interface for Global Cluster Option in VCS 4.0. This includes
Wizard to add remote cluster. Wizard to convert local groups to global. UI changes to show remote cluster, global group and heart beat details Alerts for cluster fault and no failover. Monitor and manage replication

VCS User Account Privileges


Give VCS users the level of authorization needed to administer components of the cluster environment.
Cluster Administrator
Full privileges

Cluster Operator
All cluster, service group, and resource-level operations

Cluster Guest
Read-only access; new users created as Cluster Guest accounts by default

Group Administrator
All service group operations for a specified service group, except deleting service groups

Group Operator
Bring service groups and resources online and take them offline; temporarily freeze or unfreeze service groups

Common Operations
Common service group operations :
Displaying status Bringing service groups online Taking service groups offline Switching service groups Freezing service groups

A Look at Some of the Commands


Displaying the status of the cluster
# hastatus -summary

Take a SG off line / On line


# hagrp offline <service_group> -sys <system> # hagrp online <service_group> -sys <system>

Switching a SG between nodes


# hagrp switch <service_group> -to <system>

Freezing and Unfreezing the SGs


# hagrp freeze <service_group> -persistent # hagrp unfreeze <service_group> -persistent

Taking resources within a SG online and offline


# hares online <resource> -sys <system> # hares offline <resource> -sys <system>

Operating System Upgrade Example


Web Requests

Freeze

Web Server

Operating System Upgrade

Comparison VCS and MC/SG


Leads market HP-UX, Windows, AIX, Solaris Support Agents for major applications. Automated updates 32 node support GAB Protocol HP-UX, Linux support. Script templates for major applications. Manual updates 16 node support TCP/IP for Heartbeat

Key Points for VCS


VCS is Peer to Peer while MC/SG is Parent/Child Easier and Faster installation Easier to Test and evaluate cluster integrity Easier to patch and update Heterogeneous Environments Intuitive Management Console

Você também pode gostar