Você está na página 1de 20

Percona XtraDB Cluster 5.

6
Field Guide Issue No. 2

By Jay Janssen and Jervin Real

Copyright 2006-2014 Percona LLC

Percona XtraDB Cluster 5.6

This is the second is a series of short Percona eBooks containing useful tips,
examples and best practices for enterprise users of Percona XtraDB Cluster. If
there is a topic you would like us to include in our next Issue, please let us know at
info@percona.com or give is a call at 1-888-316-9775.

Table of Contents
Chapter 1: Finding a good IST donor
Chapter 2: keepalived with reader and writer VIPs
Chapter 3: New wsrep_provider_optionss
Chapter 4: Useful MySQL 5.6 features you get for free

3
6
11
13

Copyright 2006-2014 Percona LLC

Percona XtraDB Cluster 5.6


Chapter 1: Finding a good IST donor

Finding a good IST donor


By Jay Janssen

Gcache and IST


The Gcache is a memory-based cache of recent Galera transactions that is local to each node in a
cluster. If a node leaves and rejoins the cluster, it can use the gcache from another node that
stayed in the cluster (i.e., its donor node) to fetch the transactions it missed (IST) as opposed to
doing a full state snapshot transfer (SST). However, there are a few nuances that are not obvious
to the beginner:
The Gcache is lost when a node restarts
The Gcache is fixed size and implemented as a LRU. Once it is full, older transactions roll
off.
Donor selection is made irregardless of the gcache state
If the given donor for a restarting node doesnt have all transactions needed, a full SST
(read: full backup) is done instead
Until recent developments, there was no way to tell what, precisely, was in the Gcache.
So, with (somewhat) arbitrary donor selection, it was hard to be certain that a node restart would
not trigger a SST. For example:
A node crashed over night or was otherwise down for some length of time. How do you
know if the gcache on any node is big enough to contain all the transactions necessary for
IST?
If you brought two nodes in your cluster simultaneously, the second one you restart might
select the first one as its donor and be forced to SST.

Along comes Percona XtraDB Cluster 5.6


Astute readers of the Percona XtraDB Cluster 5.6.15 release notes will have noticed this little tidbit:
New wsrep_local_cached_downto status variable has been introduced. This variable
shows the lowest sequence number in gcache. This information can be helpful with
determining IST and/or SST.

Percona XtraDB Cluster 5.6


Chapter 1: Finding a good IST donor

Until this release there was no visibility into any nodes Gcache and what was likely to happen
when restarting a node. You could make some assumptions, but now it its a bit easier to:
1. Tell if a given node would be a suitable donor
2. And hence select a donor manually using wsrep_sst_donor instead of leaving it to chance.

What it looks like


Suppose I have a 3 node cluster where load is hitting node1. I execute the following in sequence:
1. Shut down node2
2. Shut down node3
3. Restart node2

At step 3, node1 is the only viable donor for node2. Because our restart was quick, we can have
some reasonable assurance that node2 will IST correctly (and it does).

However, before we restart node3, lets check the oldest transaction in the gcache on nodes 1 and 2:

So we can see that node1 has a much more complete gcache than node2 does (i.e., a much
smaller seqno). Node2?s gcache was wiped when it restarted, so it only has transactions from after
its restart.

Percona XtraDB Cluster 5.6


Chapter 1: Finding a good IST donor

To check node3?s GTID, we can either check the grastate.dat, or (if it has crashed and the
grastate is zeroed) use wsrep_recover:

So, armed with this information, we can tell what would happen to node3, depending on which donor
was selected:

So, we can instruct node3 to use node1 as its donor on restart with wsrep_sst_donor:

Note that passing mysqld options on the command line is only supported in RPM packages,
Debian requires you put that setting in your my.cnf. We can see from node3?s log that it does
properly IST:

Sometime in the future, this may be handled automatically on donor selection, but for now it is very
useful that we can at least see the status of the gcache.

Percona XtraDB Cluster 5.6


Chapter 2: keepalived with reader and writer VIPs

keepalived with reader and writer VIPs


By Jervin Real
We had a request recently in which the customer had 2 VIPs (Virtual IP addresses), one for reader
and one for a writer for a cluster of 3 nodes. They wanted to keep it simple, with low latency and
does not require an external node resource like HaProxy would.
keepalived is a simple load balancer with HA capabilities, which means it can proxy TCP services
behind it and at the same time, keep itself highly available using VRRP as failover mechanism.
This chapter is about taking advantage of the VRRP capabilities built into keepalived to intelligently
manage your PXC VIPs.
While Yves Trudeau also wrote a very interesting and somewhat similar solution using ClusterIP
and Pacemaker to load balance VIPs, they have different use cases. Both solutions reduce latency
from an external proxy or load balancer, but unlike ClusterIP, connections to the desired VIP with
keepalived go to a single node which means a little less work for each node trying to see if they
should respond to the request. ClusterIP is good if you want to send writes to all nodes in
calculated distribution while with our keepalived option, each VIP at best assigned to only a single
node depending on your workload, each will have advantages and disadvantages.
The OS I used was CentOS 6.4 with keepalived 1.2.7 available in the yum repositories, however,
its difficult to troubleshoot failover behavior with VRRP_Instance weights without seeing them
from keepalived directly. So I used a custom build, with a patch for vrrp-status option that allows
me to monitor something like this:

Percona XtraDB Cluster 5.6


Chapter 2: keepalived with reader and writer VIPs

So first, lets compile keepalived from source, the Github branch here is where the status patch is
available.

Install the customer tracker script below because compiling keepalived above installs it on
/usr/local/bin, I put this script there as well. One would note that this script is completely redundant,
its true, but beware that keepalived does not validate its configuration, especially track_scripts so I
prefer to have it on separate bash script so I can easily debug misbehavior. Of course when all is
working well, you can always merge this to the keepalived.conf file.

And on the following page is my /etc/keepalived.conf:

Percona XtraDB Cluster 5.6


Chapter 2: keepalived with reader and writer VIPs

Percona XtraDB Cluster 5.6


Chapter 2: keepalived with reader and writer VIPs

There are a number of things you can change here like remove or modify the notify_* clauses to fit
your needs or send SMTP notifications during VIP failovers. I also prefer the initial state of the
VRRP_Instances to be on BACKUP instead of master and let the voting on runtime dictate where
the VIPs should go.
The configuration ensures that the reader and writer will not share a single node if more than one is
available in the cluster. Even though the writer VIP prefers pxc01 in my example, this does not
really matter much and only makes a difference when the reader VIP is not in the picture, there is
no automatic failback with the help of the nopreempt_* track_scripts.
Now, to see it in action, after starting the cluster and keepalived in order pxc01, pxc02, pxc03, I
have these statuses and weights:

The writer is on pxc01 and reader on pxc02 even though the reader VIP score between pxc02
and pxc03 matches, it remains on pxc02 because of our nopreempt_* script. Lets see what
happens if I stop MySQL on pxc02:

The reader VIP moved to pxc03 and the weights changed, pxc02 reader dropped by 100 and on
pxc03 it gained by 50 again we set this higher for nor preempt. Now lets stop MySQL on pxc03:

Percona XtraDB Cluster 5.6


Chapter 2: keepalived with reader and writer VIPs

Our reader is back to pxc02 and writer remains intact. When both VIPs end up on a single node
i.e. last node standing, and a second node comes up, the reader moves not the writer this is to
prevent any risks in breaking any connections that may be writing to the node currently owning
the VIP.

10

Percona XtraDB Cluster 5.6


Chapter 3: New wsrep_provider_options

New wsrep_provider_options
By Jay Janssen
Now that Percona XtraDB Cluster 5.6 is out, I wanted to talk about some of the new features in
Galera 3 and Percona XtraDB Cluster 5.6. On the surface, Galera 3 doesnt reveal a lot of new
features yet, but there has been a lot of refactoring of the system in preparation for great new
features in the future.

Galera vs. MySQL options


wsrep_provider_options is a semi-colon separated list of key => value configurations that set
low-level Galera library configuration. These tweak the actual cluster communication and
replication in the group communication system. By contrast, other Percona XtraDB Cluster global
variables (like wsrep%) are set like other mysqld options and generally have more to do with
MySQL/Galera integration. This post will cover the Galera options and mysql-level changes will
have to wait for another post.
Here are the differences in the wsrep_provider_options between 5.5 and 5.6:

gmcast.segment=0
This is a new setting in 3.x and allows us to distinguish between nodes in different WAN segments.
For example, all nodes in a single datacenter would be configured with the same segment number,
but each datacenter would have its own segment.
Segments are currently used in two main ways:
1. Replication traffic between segments is minimized. Writesets originating in one segment
should be relayed through only one node in every other segment. From those local relays
replication is propagated to the rest of the nodes in each segment respectively.
2. Segments are used in Donor-selection. Yes, donors in the same segment are preferred,
but not required.

11

Percona XtraDB Cluster 5.6


Chapter 3: New wsrep_provider_options

replicator -> repl


The older replicator tag is now renamed to repl and the causal_read_timeout and
commit_order settings have moved there. No news here really.

repl.key_format = FLAT8
Every writeset in Galera has associated keys. These keys are effectively a list of primary, unique,
and foreign keys associated with all rows modified in the writeset. In Galera 2 these keys were
replicated as literal values, but in Galera 3 they are hashed in either 8 or 16 byte values (FLAT8 vs
FLAT16). This should generally make the key sizes smaller, especially with large CHAR keys.
Because the keys are now hashed, there can be collisions where two distinct literal key values
result in the same 8-byte hashed value. This means practically that the places in Galera that rely
on keys may falsely believe that there is a match between two writesets when there really is not.
This should be quite rare. This false positive could affect:
Local certification failures (Deadlocks on commit) that are unnecessary.
Parallel apply things could be done in a stricter order (i.e., less parallelization) than
necessary
Neither case affects data consistency. The tradeoff is more efficiency in keys and key operations
generally making writesets smaller and certification faster.

repl.proto_max
Limits the Galera protocol version that can be used in the cluster. Coderships documentation
states it is for debugging only.
1. Replication traffic between segments is minimized. Writesets originating in one segment
should be relayed through only one node in every other segment. From those local relays
replication is propagated to the rest of the nodes in each segment respectively.
2. Segments are used in Donor-selection. Yes, donors in the same segment are preferred,
but not required.

socket.checksum = 2
This modifies the previous network packet checksum algorithm (CRC32) to support CRC32-C
which is hardware accelerated on supported gear. Packet checksums also can now be completely
disabled (=0).

12

Percona XtraDB Cluster


Chapter 4: Useful MySQL 5.6 features you get for free

Useful MySQL 5.6 features you get for free


By Jay Janssen
I get a lot of questions about Percona XtraDB Cluster 5.6 (PXC 5.6), specifically about whether
such and such MySQL 5.6 Community Edition feature is in Percona XtraDB Cluster 5.6. The short
answer is: yes, all features in community MySQL 5.6 are in Percona Server 5.6 and, in turn, are in
PXC 5.6. Whether or not the new feature is useful in 5.6 really depends on how useful it is in
general with Galera.
I thought it would be useful to highlight a few features and try to show them working:

Innodb Fulltext Indexes


Yes, FTS works in Innodb in MySQL 5.6, so why wouldnt it work in Percona XtraDB Cluster 5.6?
To test this I used the Sakila database, which contains a single table with FULLTEXT. In the
sakila-schema.sql file, it is still designated a MyISAM table:

I edited that file to change MyISAM to Innodb, loaded the schema and data into my 3 node cluster:

and it works seamlessly:

13

Percona XtraDB Cluster


Chapter 4: Useful MySQL 5.6 features you get for free

Sure enough, I can run this query on any node and it works fine:

There might be a few caveats and differences from how FTS works in Innodb vs MyISAM, but it is
there.

Minimal replication images


Galera relies heavily on RBR events, but until 5.6 those were entire row copies, even if you only
changed a single column in the table. In 5.6 you can change this to send only the updated data
using the variable binlog_row_image=minimal.
Using a simple sysbench update test for 1 minute, I can determine the baseline size of the
replicated data:

This results in 62.3 MB of data replicated in this test.

14

Percona XtraDB Cluster


Chapter 4: Useful MySQL 5.6 features you get for free

If I set binlog_row_image=minimal on all nodes and do a rolling restart, I can see how this changes:

This yields a mere 13.4MB, thats 80% smaller, quite a savings! This benefit, of course, fully
depends on the types of workloads you are doing.

Durable Memcache Cluster


It turns out this feature does not work properly with Galera, see below for an explanation:
5.6 introduces an Memcached interface for Innodb. This means any standard memcache client
can talk to our PXC nodes with the memcache protocol and the data is:

To set this up, we need to simply load the innodb_memcache schema from the example and
restart the daemon to get a listening memcached port:

15

Percona XtraDB Cluster


Chapter 4: Useful MySQL 5.6 features you get for free

This all appears to work and I can fetch the sample AA row from all the nodes with the memcached interface:

However, if I try to update a row, it does not seem to replicate (even if I set innodb_api_enable_binlog):

So unfortunately the memcached plugin must use some backdoor to Innodb that Galera is unaware
of. Ive filed a bug on the issue, but its not clear if there will be an easy solution or if a whole lot of
code will be necessary to make this work properly.
In the short-term, however, you can at least read data from all nodes with the memcached plugin
as long as data is only written using the standard SQL interface.

16

Percona XtraDB Cluster


Chapter 4: Useful MySQL 5.6 features you get for free

Async replication GTID Integration


Async GTIDs were introduced in 5.6 in order to make CHANGE MASTER easier. You have always
been able to use async replication from any cluster node, but now with this new GTID support, it is
much easier to failover to another node in the cluster as a new master.
If we take one node out of our cluster to be a slave and enable GTID binary logging on the other
two by adding these settings:

If I generate some writes on the cluster, I can see GTIDs are working:

Notice that were at GTID 1505 on both nodes, even though the binary log position happens to be
different.
I set up my slave to replicate from node1 (.70.2):

17

Percona XtraDB Cluster


Chapter 4: Useful MySQL 5.6 features you get for free

And its all caught up. If put some load on the cluster, I can easily change to node2 as my master
without needing to stop writes:

So this seems to work pretty well.

Conclusion
MySQL 5.6 introduces a lot of new interesting features that are even more compelling in the
PXC/Galera world. If you want to experiment for yourself, I pushed the Vagrant environment I used
to Github at: https://github.com/jayjanssen/pxc_56_features

18

About the authors


Jay Janssen: Percona principal consultant

Jay joined Percona in 2011 after 7 years at Yahoo working in a variety of fields
including High Availability architectures, MySQL training, tool building, global
server load balancing, multi-datacenter environments, operationalization, and
monitoring. He holds a B.S. of Computer Science from Rochester Institute of
Technology.

Jervin Real: Percona support engineer

When you come to Percona for consulting and support, chances are he'll be
greeting you first. His primary role is to make sure customer issues are handled
efficiently and professionally. Jervin joined Percona in May 2010.

Copyright 2006-2014 Percona


LLC
21

About Percona
Percona has made MySQL faster and more reliable for over 2,000 consulting and
support customers worldwide since 2006. Percona provides enterprise-grade MySQL
support, Consulting, Training, Remote DBA, and Server Development services to
companies such as Cisco Systems, Alcatel-Lucent, Groupon, and the BBC. Percona's
founders authored the definitive book High Performance MySQL from O'Reilly Press
and the widely read MySQL Performance Blog. Percona also develops software for
MySQL users, including Percona Server, Percona XtraBackup, Percona XtraDB
Cluster, and Percona Toolkit. The popular Percona Live conferences draw attendees
and acclaimed speakers from around the world. For more information, visit
www.percona.com.

Copyright 2006-2014
22 Percona LLC

Você também pode gostar