Escolar Documentos
Profissional Documentos
Cultura Documentos
6
Field Guide Issue No. 2
This is the second is a series of short Percona eBooks containing useful tips,
examples and best practices for enterprise users of Percona XtraDB Cluster. If
there is a topic you would like us to include in our next Issue, please let us know at
info@percona.com or give is a call at 1-888-316-9775.
Table of Contents
Chapter 1: Finding a good IST donor
Chapter 2: keepalived with reader and writer VIPs
Chapter 3: New wsrep_provider_optionss
Chapter 4: Useful MySQL 5.6 features you get for free
3
6
11
13
Until this release there was no visibility into any nodes Gcache and what was likely to happen
when restarting a node. You could make some assumptions, but now it its a bit easier to:
1. Tell if a given node would be a suitable donor
2. And hence select a donor manually using wsrep_sst_donor instead of leaving it to chance.
At step 3, node1 is the only viable donor for node2. Because our restart was quick, we can have
some reasonable assurance that node2 will IST correctly (and it does).
However, before we restart node3, lets check the oldest transaction in the gcache on nodes 1 and 2:
So we can see that node1 has a much more complete gcache than node2 does (i.e., a much
smaller seqno). Node2?s gcache was wiped when it restarted, so it only has transactions from after
its restart.
To check node3?s GTID, we can either check the grastate.dat, or (if it has crashed and the
grastate is zeroed) use wsrep_recover:
So, armed with this information, we can tell what would happen to node3, depending on which donor
was selected:
So, we can instruct node3 to use node1 as its donor on restart with wsrep_sst_donor:
Note that passing mysqld options on the command line is only supported in RPM packages,
Debian requires you put that setting in your my.cnf. We can see from node3?s log that it does
properly IST:
Sometime in the future, this may be handled automatically on donor selection, but for now it is very
useful that we can at least see the status of the gcache.
So first, lets compile keepalived from source, the Github branch here is where the status patch is
available.
Install the customer tracker script below because compiling keepalived above installs it on
/usr/local/bin, I put this script there as well. One would note that this script is completely redundant,
its true, but beware that keepalived does not validate its configuration, especially track_scripts so I
prefer to have it on separate bash script so I can easily debug misbehavior. Of course when all is
working well, you can always merge this to the keepalived.conf file.
There are a number of things you can change here like remove or modify the notify_* clauses to fit
your needs or send SMTP notifications during VIP failovers. I also prefer the initial state of the
VRRP_Instances to be on BACKUP instead of master and let the voting on runtime dictate where
the VIPs should go.
The configuration ensures that the reader and writer will not share a single node if more than one is
available in the cluster. Even though the writer VIP prefers pxc01 in my example, this does not
really matter much and only makes a difference when the reader VIP is not in the picture, there is
no automatic failback with the help of the nopreempt_* track_scripts.
Now, to see it in action, after starting the cluster and keepalived in order pxc01, pxc02, pxc03, I
have these statuses and weights:
The writer is on pxc01 and reader on pxc02 even though the reader VIP score between pxc02
and pxc03 matches, it remains on pxc02 because of our nopreempt_* script. Lets see what
happens if I stop MySQL on pxc02:
The reader VIP moved to pxc03 and the weights changed, pxc02 reader dropped by 100 and on
pxc03 it gained by 50 again we set this higher for nor preempt. Now lets stop MySQL on pxc03:
Our reader is back to pxc02 and writer remains intact. When both VIPs end up on a single node
i.e. last node standing, and a second node comes up, the reader moves not the writer this is to
prevent any risks in breaking any connections that may be writing to the node currently owning
the VIP.
10
New wsrep_provider_options
By Jay Janssen
Now that Percona XtraDB Cluster 5.6 is out, I wanted to talk about some of the new features in
Galera 3 and Percona XtraDB Cluster 5.6. On the surface, Galera 3 doesnt reveal a lot of new
features yet, but there has been a lot of refactoring of the system in preparation for great new
features in the future.
gmcast.segment=0
This is a new setting in 3.x and allows us to distinguish between nodes in different WAN segments.
For example, all nodes in a single datacenter would be configured with the same segment number,
but each datacenter would have its own segment.
Segments are currently used in two main ways:
1. Replication traffic between segments is minimized. Writesets originating in one segment
should be relayed through only one node in every other segment. From those local relays
replication is propagated to the rest of the nodes in each segment respectively.
2. Segments are used in Donor-selection. Yes, donors in the same segment are preferred,
but not required.
11
repl.key_format = FLAT8
Every writeset in Galera has associated keys. These keys are effectively a list of primary, unique,
and foreign keys associated with all rows modified in the writeset. In Galera 2 these keys were
replicated as literal values, but in Galera 3 they are hashed in either 8 or 16 byte values (FLAT8 vs
FLAT16). This should generally make the key sizes smaller, especially with large CHAR keys.
Because the keys are now hashed, there can be collisions where two distinct literal key values
result in the same 8-byte hashed value. This means practically that the places in Galera that rely
on keys may falsely believe that there is a match between two writesets when there really is not.
This should be quite rare. This false positive could affect:
Local certification failures (Deadlocks on commit) that are unnecessary.
Parallel apply things could be done in a stricter order (i.e., less parallelization) than
necessary
Neither case affects data consistency. The tradeoff is more efficiency in keys and key operations
generally making writesets smaller and certification faster.
repl.proto_max
Limits the Galera protocol version that can be used in the cluster. Coderships documentation
states it is for debugging only.
1. Replication traffic between segments is minimized. Writesets originating in one segment
should be relayed through only one node in every other segment. From those local relays
replication is propagated to the rest of the nodes in each segment respectively.
2. Segments are used in Donor-selection. Yes, donors in the same segment are preferred,
but not required.
socket.checksum = 2
This modifies the previous network packet checksum algorithm (CRC32) to support CRC32-C
which is hardware accelerated on supported gear. Packet checksums also can now be completely
disabled (=0).
12
I edited that file to change MyISAM to Innodb, loaded the schema and data into my 3 node cluster:
13
Sure enough, I can run this query on any node and it works fine:
There might be a few caveats and differences from how FTS works in Innodb vs MyISAM, but it is
there.
14
If I set binlog_row_image=minimal on all nodes and do a rolling restart, I can see how this changes:
This yields a mere 13.4MB, thats 80% smaller, quite a savings! This benefit, of course, fully
depends on the types of workloads you are doing.
To set this up, we need to simply load the innodb_memcache schema from the example and
restart the daemon to get a listening memcached port:
15
This all appears to work and I can fetch the sample AA row from all the nodes with the memcached interface:
However, if I try to update a row, it does not seem to replicate (even if I set innodb_api_enable_binlog):
So unfortunately the memcached plugin must use some backdoor to Innodb that Galera is unaware
of. Ive filed a bug on the issue, but its not clear if there will be an easy solution or if a whole lot of
code will be necessary to make this work properly.
In the short-term, however, you can at least read data from all nodes with the memcached plugin
as long as data is only written using the standard SQL interface.
16
If I generate some writes on the cluster, I can see GTIDs are working:
Notice that were at GTID 1505 on both nodes, even though the binary log position happens to be
different.
I set up my slave to replicate from node1 (.70.2):
17
And its all caught up. If put some load on the cluster, I can easily change to node2 as my master
without needing to stop writes:
Conclusion
MySQL 5.6 introduces a lot of new interesting features that are even more compelling in the
PXC/Galera world. If you want to experiment for yourself, I pushed the Vagrant environment I used
to Github at: https://github.com/jayjanssen/pxc_56_features
18
Jay joined Percona in 2011 after 7 years at Yahoo working in a variety of fields
including High Availability architectures, MySQL training, tool building, global
server load balancing, multi-datacenter environments, operationalization, and
monitoring. He holds a B.S. of Computer Science from Rochester Institute of
Technology.
When you come to Percona for consulting and support, chances are he'll be
greeting you first. His primary role is to make sure customer issues are handled
efficiently and professionally. Jervin joined Percona in May 2010.
About Percona
Percona has made MySQL faster and more reliable for over 2,000 consulting and
support customers worldwide since 2006. Percona provides enterprise-grade MySQL
support, Consulting, Training, Remote DBA, and Server Development services to
companies such as Cisco Systems, Alcatel-Lucent, Groupon, and the BBC. Percona's
founders authored the definitive book High Performance MySQL from O'Reilly Press
and the widely read MySQL Performance Blog. Percona also develops software for
MySQL users, including Percona Server, Percona XtraBackup, Percona XtraDB
Cluster, and Percona Toolkit. The popular Percona Live conferences draw attendees
and acclaimed speakers from around the world. For more information, visit
www.percona.com.
Copyright 2006-2014
22 Percona LLC