Você está na página 1de 23

OpenStack High Availability

Jakub Pavlik

About me
Jakub Pavlk
Cloud Platform
Engineer
3 years in Cloud
2 years in OpenStack

High Availability vs. Disaster Recovery


High Availability = fault detection & correction procedures
to maximize availability of critical services and applications,
often in an automated fashion.
Disaster Recovery = process of preparing for recovery or
continuation of technology infrastructure critical to an
organization after a natural or human-induced disaster.

High Availability Disaster Recovery!

Four types of HA in an OpenStack Cloud


Compute
Controller
Network
Controller
Database
Message Queue
Storage
Physical
.... nodes
Physical network
Physical storage
Hypervisor
Host OS
.

OpenStack
Control
services

Applicati
ons
VMs
OpenStack
Compute

Physical
infrastructure

Service Resiliency
QoS Cost
Transparency
Data Integrity
..

Virtual Machine
Virtual Network
Virtual Storage
VM Mobility

Physical Infrastructure

tcp cloud
VPC
Hardware

Switch 1

Passthru
1

Passthru
2

Switch 2

Passthru
1

Passthru
2

168 cores 3,46GHz ,336


threads
agregation : 1344 vCPU

168 cores 2,67GHz ,336


threads
agregation : 1344 vCPU

2688 GB RAM
28 x 10GE ports

1792 GB RAM
28 x 10GE ports

SAN 1

Controller
1

SAN 2

Controller
2

SAN 1

Controller
1

SAN 2

Controller
2

OpenStack Control services

OpenStack modules TCP VPC

OpenStack High Availability Concepts


Stateless services
There is no dependency between requests
For example APIs: Nova, Keystone, Glance,
Cinder, etc.
Stateful services
An action typically compromises multiple
requests
Active/Passive
For
example:
MySQL, of
RabbitMQ,
Redundant
instances
statelessetc.
services are
load balanced
For Stateful services a replacement resource can
be brought online
Active/Active
Redundant instances of stateless services are
load balanced

Corosync, Pacemaker and HAProxy


Corosync
Totem single-ring ordering and
membership protocol
UDP and InfiniBand based messaging,
quorum, and cluster membership to
Pacemaker
Pacemaker
High availability and load balancing
stack for the Linux platform.
Interacts with applications through
Resource Agents (RA)
HAProxy
Load Balancing and Proxying for HTTP
and TCP Applications
Works over multiple connections

MySQL Galera
Synchronous multi-master cluster technology
for MySQL/InnoDB

MySQL patched for wsrep


(Write Set REPlication)
Active/active multimaster topology
Read and write to any
cluster node
True parallel replication,
in row level
No slave lag or integrity
issues

Sample OpenStack HA architecture


Stateful
Cinder Volume
Neutron L3, DHCP
agents
Ceilometer central
agent
RabbitMQ
Stateless
Neutron Server
OpenStack APIs
Apache web server
Nova Scheduler
Cinder Scheduler

Neutron
agents
(Active)

Neutron
agents
(Hot
Standby)

VMs Compute nodes

VMs HA two layers


Storage
Shared storage filesystem file disks (qcow2,
vmdk, vhv)
Block storage
Network
Vanilla Neutron L3 agent (OpenVSwitch, Linux
Bridge)
Vendor plugins - SDN controller

No vSphere Style HA with KVM

Non-Shared/Shared Storage filesystem


Shared Storage
Live migration just RAM memory
Hypervisor Evacuation The instance will be
booted from same disk and data will be
preserved
CEPH, Gluster, NFS, Samba, GFS
Non-Shared Storage
Block Live Migration disk and RAM
Hypervisor Evacuation the instance will be
booted from a new disk, but will preserve the
configuration, e.g. id, name, uuid
Standard filesystem EXT4, etc.

Block Storage - Cinder

Instance boots from volume


iSCSI/FC direct mapping to instance
Enable Live Migration
Cinder Backends
LVM Driver
Default linux iSCSI server
Vendor software plugins
Gluster, CEPH, VMware VMDK driver
Vendor storage plugins
EMC VNX, IBM Storwize, Solid Fire, etc.

Networking - Vanilla Neutron L3 agent


Problems
Routing on Linux server (max. bandwith
approximately 3-4 Gbits)
Limited distribution between more network
nodes
East-West and North-South communication
through network node
High Availability
Pacemaker&Corosync
Keepalived VRRP
DVR + VRRP should be in Juno release

Networking Vendor SDN Controller plugins


Examples
Juniper OpenContrail, VMware NSX, SDN
PLUMgrid
Advantages against Neutron L3 agent
North-South communication on network devices
(iBGP, MLPSoverGRE)
East-West communication directly between
compute nodes
Higher bandwidth (9.7 Gbits per 10Gbits port)
High Availability
iBGP peering into two routers
Native HA implemented inside of network
devices

OpenStack HA
TCP VPC

VIP

Bond
Interface

Pacemaker
Corosync

HAProx
y
Contrail

Config with
Analytics &
WebUI
Contrail
Control

MySQL RabbitMQ

Contrail
Config with
Analytics &
WebUI
Contrail
Control

Contrail
Database

Contrail
Database

Cassandr
a
Openstack
Controller
MySQL RabbitMQ

GALERA

Zook
eepe
r

Cassandr
a
Openstack
Controller

HAProxy

Contrail
Config with
Analytics &
WebUI

Zook
eepe
r

Zook
eepe
r

Contrail
Database

Pacemaker
Corosync

HAProx
y

Cassandr
a
Openstack
Controller

MySQL RabbitMQ

TCP Virtual Private Cloud

HA methods - vendors
Vendor

Cluster/Replication
Technique

Characteristics

RackSpace

Keepalived, HAProxy,
VRRP, DRBD

Automatic - Chef

Red Hat

Pacemaker, Corosync,
Galera

Manual
installation/Fore
man

Cisco

Keepalived, HAProxy,
Galera

Manual
installation, at
least 3 controller

tcp cloud

Pacemaker, Corosync,
HAProxy, Galera,
Contrail

Automatic
Salt-Stack
deployment

Thank you for your attention!