MHA MySQL Quick Start Guide

Stella981
• 阅读 375

MHA MySQL Quick Start Guide

02 Sep 2016

By Miguel Angel Nieto Insight for DBAs, MySQL MHA, MHA QuickStart Guide, MySQL 6 Comments

MHA (Master High Availability Manager and tools for MySQL) is one of the most important pieces of our managed services. When properly set up, it can check replication health, move writer and reader virtual IPs, perform failovers, and have its output constantly monitored by Nagios. Is it easy to deploy and follows the KISS (Keep It Simple, Stupid) philosophy that I love so much.

This blog post is a quick start guide to try it out and play with it in your own testing environment. I assume that you already know how to install software, deal with SSH keys and setup replication in MySQL. The post just covers MHA configuration.

Testing environment

Taken from /etc/hosts

Shell

192.168.1.116 mysql-server1 192.168.1.117 mysql-server2 192.168.1.118 mysql-server3 192.168.1.119 mha-manager

1

2

3

4

192.168.1.116 mysql - server1

192.168.1.117 mysql - server2

192.168.1.118 mysql - server3

192.168.1.119 mha - manager

mysql-server1: Our master MySQL server with 5.6
mysql-server2: Slave server
mysql-server3: Slave server
mha-manager: The server monitors the replication and from where we manage MHA. The installation is also required to meet some Perl dependencies.

We just introduced some new concepts, the MHA Node and MHA Manager:

MHA Node

It is installed and runs on each MySQL server. This is the piece of software that it is invoked by the manager every time we want to do something, like for example a failover or a check.

MHA Manager

As explained before, this is our operations center. The manager monitors the services, replication, and includes several administrative command lines.

Pre-requisites

  • Replication must already be running. MHA manages replication and monitors it, but it is not a tool to deploy it. So MySQL and replication need to be running already.
  • All hosts should be able to connect to each other using public SSH keys.
  • All nodes need to be able to connect to each other’s MySQL servers.
  • All nodes should have the same replication user and password.
  • In the case of multi-master setups, only one writable node is allowed. All others need to be configured with read_only.
  • MySQL version has to be 5.0 or later.
  • Candidates for master failover should have binary log enabled. The replication user must exist there too.
  • Binary log filtering variables should be the same on all servers (replicate-wild%, binlog-do-db…).
  • Disable automatic relay-log purge and do it regularly from a cron task. You can use an MHA-included script called “purge_relay_logs”.

While that is a large list of requisites, I think that they are pretty standard and logical.

MHA installation

As explained before, the MHA Node needs to be installed on all the nodes. You can download it from this Google Drive link.

This post shows you how to install it using the source code, but there are RPM packages available. Deb too, but only for older versions. Use the installation method you prefer. This is how to compile it:

Shell

tar -xzf mha4mysql-node-0.57.tar.gz perl Makefile.PL make make install

1

2

3

4

tar - xzf mha4mysql - node - 0.57.tar.gz

perl Makefile .PL

make

make install

The commands included in the node package are save_binary_logs, filter_mysqlbinlog, purge_relay_logs, apply_diff_relay_logs. Mostly tools that the manager needs to call in order to perform a failover, while trying to minimize or avoid any data loss.

On the manager server, you need to install MHA Node plus MHA Manager. This is due to MHA Manager dependance on a Perl library that comes with MHA Node. The installation process is just the same.

Configuration

We only need one configuration file on the Manager node. The example below is a good starting point:

Shell

# cat /etc/app1.cnf [server default] # mysql user and password user=root password=supersecure ssh_user=root # working directory on the manager manager_workdir=/var/log/masterha/app1 # working directory on MySQL servers remote_workdir=/var/log/masterha/app1 [server1] hostname=mysql-server1 candidate_master=1 [server2] hostname=mysql-server2 candidate_master=1 [server3] hostname=mysql-server3 no_master=1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

# cat /etc/app1.cnf

[ server default ]

# mysql user and password

user = root

password = supersecure

ssh_user = root

# working directory on the manager

manager_workdir = / var / log / masterha / app1

# working directory on MySQL servers

remote_workdir = / var / log / masterha / app1

[ server1 ]

hostname = mysql - server1

candidate_master = 1

[ server2 ]

hostname = mysql - server2

candidate_master = 1

[ server3 ]

hostname = mysql - server3

no_master = 1

So pretty straightforward. It specifies that there are three servers, two that can be master and one that can’t be promoted to master.

Let’s check if we meet some of the pre-requisites. We are going to test if replication is working, can be monitored, and also if SSH connectivity works.

Shell

# masterha_check_ssh --conf=/etc/app1.cnf [...] [info] All SSH connection tests passed successfully.

1

2

3

# masterha_check_ssh --conf=/etc/app1.cnf

[ . . . ]

[ info ] All SSH connection tests passed successfully .

It works. Now let’s check MySQL:

Shell

# masterha_check_repl --conf=/etc/app1.cnf [...] MySQL Replication Health is OK.

1

2

3

# masterha_check_repl --conf=/etc/app1.cnf

[ . . . ]

MySQL Replication Health is OK .

Start the manager and operations

Everything is setup, we meet the pre-requisites. We can start our manager:

Shell

# masterha_manager --remove_dead_master_conf --conf=/etc/app1.cnf [...] [info] Starting ping health check on mysql-server1(192.168.1.116:3306).. [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

1

2

3

4

# masterha_manager --remove_dead_master_conf --conf=/etc/app1.cnf

[ . . . ]

[ info ] Starting ping health check on mysql - server1 ( 192.168.1.116 : 3306 ) . .

[ info ] Ping ( SELECT ) succeeded , waiting until MySQL doesn' t respond . .

The manager found our master and it is now actively monitoring it using a SELECT command. –remove_dead_master_conf tells the manager that if the master goes down, it must edit the config file and remove the master’s configuration from it after a successful failover. This avoids the “there is a dead slave” error when you restart the manager. All servers listed in the conf should be part of the replication and in good health, or the manager will refuse to work.

Automatic and manual failover

Good, everything is running as expected. What happens if the MySQL master dies!?!

Shell

[...] [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin Creating /var/log/masterha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/log/mysql, up to mysql-bin.000002 [info] HealthCheck: SSH to mha-server1 is reachable. [...]

1

2

3

4

5

6

7

8

9

[ . . . ]

[ warning ] Got error on MySQL select ping : 2006 ( MySQL server has gone away )

[ info ] Executing SSH check script : save_binary_logs -- command = test -- start_pos = 4 -- binlog_dir = / var / lib / mysql , / var / log / mysql -- output_file = / var / log / masterha / app1 / save_binary_logs_test -- manager_version = 0.57 -- binlog_prefix = mysql - bin

Creating / var / log / masterha / app1 if not exists . . ok .

Checking output directory is accessible or not . .

ok .

Binlog found at / var / log / mysql , up to mysql - bin . 000002

[ info ] HealthCheck : SSH to mha - server1 is reachable .

[ . . . ]

First, it tries to connect by SSH to read the binary log and save it. MHA can apply the missing binary log events to the remaining slaves so they are up to date with all the before-failover info. Nice!

Theses different phases follow:

Shell

* Phase 1: Configuration Check Phase.. * Phase 2: Dead Master Shutdown Phase.. * Phase 3: Master Recovery Phase.. * Phase 3.1: Getting Latest Slaves Phase.. * Phase 3.2: Saving Dead Master's Binlog Phase.. * Phase 3.3: Determining New Master Phase.. [info] Finding the latest slave that has all relay logs for recovering other slaves.. [info] All slaves received relay logs to the same position. No need to resync each other. [info] Starting master failover.. [info] From: mysql-server1(192.168.1.116:3306) (current master) +--mysql-server2(192.168.1.117:3306) +--mysql-server3(192.168.1.118:3306) To: mysql-server2(192.168.1.117:3306) (new master) +--mysql-server3(192.168.1.118:3306) * Phase 3.3: New Master Diff Log Generation Phase.. * Phase 3.4: Master Log Apply Phase.. * Phase 4: Slaves Recovery Phase.. * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. * Phase 4.2: Starting Parallel Slave Log Apply Phase.. * Phase 5: New master cleanup phase..

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

* Phase 1 : Configuration Check Phase . .

* Phase 2 : Dead Master Shutdown Phase . .

* Phase 3 : Master Recovery Phase . .

* Phase 3.1 : Getting Latest Slaves Phase . .

* Phase 3.2 : Saving Dead Master' s Binlog Phase . .

* Phase 3.3 : Determining New Master Phase . .

[ info ] Finding the latest slave that has all relay logs for recovering other slaves . .

[ info ] All slaves received relay logs to the same position . No need to resync each other .

[ info ] Starting master failover . .

[ info ]

From :

mysql - server1 ( 192.168.1.116 : 3306 ) ( current master )

  • -- mysql - server2 ( 192.168.1.117 : 3306 )

  • -- mysql - server3 ( 192.168.1.118 : 3306 )

To :

mysql - server2 ( 192.168.1.117 : 3306 ) ( new master )

  • -- mysql - server3 ( 192.168.1.118 : 3306 )

* Phase 3.3 : New Master Diff Log Generation Phase . .

* Phase 3.4 : Master Log Apply Phase . .

* Phase 4 : Slaves Recovery Phase . .

* Phase 4.1 : Starting Parallel Slave Diff Log Generation Phase . .

* Phase 4.2 : Starting Parallel Slave Log Apply Phase . .

* Phase 5 : New master cleanup phase . .

The phases are pretty self-explanatory. MHA tries to get all the data possible from the master’s binary log and slave’s relay log (the one that is more advanced) to avoid losing any data or promote a slave that it was far behind the master. So it tries to promote a slave with the most current data as possible. We see that server2 has been promoted to master, because in our configuration we specified that server3 shouldn’t be promoted.

After the failover, the manager service stops itself. If we check the config file, the failed server is not there anymore. Now the recovery is up to you. You need to get the old master back in the replication chain, then add it again to the config file and start the manager.

It is also possible to perform a manual failover (if, for example, you need to do some maintenance on the master server). To do that you need to:

  • Stop masterha_manager.
  • Run masterha_master_switch –master_state=alive –conf=/etc/app1.cnf. The line says that you want to switch the master, but the actual master is still alive, so no need to mark it as dead or remove it from the conf file.

And that’s it. Here is part of the output. It shows the tool making the decision on the new topology and asking the user for confirmation:

Shell

[info] From: mysql-server1(192.168.1.116:3306) (current master) +--mysql-server2(192.168.1.117:3306) +--mysql-server3(192.168.1.118:3306) To: mysql-server2(192.168.1.117:3306) (new master) +--mysql-server3(192.168.1.118:3306) Starting master switch from mha-server1(192.168.1.116:3306) to mha-server2(192.168.1.117:3306)? (yes/NO): yes [...] [info] Switching master to mha-server2(192.168.1.117:3306) completed successfully.

1

2

3

4

5

6

7

8

9

10

11

12

13

[ info ]

From :

mysql - server1 ( 192.168.1.116 : 3306 ) ( current master )

  • -- mysql - server2 ( 192.168.1.117 : 3306 )

  • -- mysql - server3 ( 192.168.1.118 : 3306 )

To :

mysql - server2 ( 192.168.1.117 : 3306 ) ( new master )

  • -- mysql - server3 ( 192.168.1.118 : 3306 )

Starting master switch from mha - server1 ( 192.168.1.116 : 3306 ) to mha - server2 ( 192.168.1.117 : 3306 ) ? ( yes / NO ) : yes

[ . . . ]

[ info ] Switching master to mha - server2 ( 192.168.1.117 : 3306 ) completed successfully .

You can also employ some extra parameters that are really useful in some cases:

–orig_master_is_new_slave: if you want to make the old master a slave of the new one.

–running_updates_limit: if the current master executes write queries that take more than this parameter’s setting, or if any of the MySQL slaves behind master take more than this parameter, the master switch aborts. By default, it’s 1 (1 second). All these checks are for safety reasons.

–interactive=0: if you want to skip all the confirmation requests and questions masterha_master_switch could ask.

Check this link in case you use GTID and want to avoid problems with errant transactions during the failover:

https://www.percona.com/blog/2015/12/02/gtid-failover-with-mysqlslavetrx-fix-errant-transactions/

Custom scripts

Since this is a quick guide to start playing around with MHA, I won’t cover advanced topics in detail. But I will mention a few:

    • Custom scripts. MHA can move IPs around, shutdown a server and send you a report in case something happens. It needs a custom script, however. MHA comes with some example scripts, but you would need to write one that fits your environment.The directives are master_ip_failover_script, shutdown_script, report_script. With them configured, MHA will send you an email or a message to your mobile device in the case of a failover, shutdown the server and move IPs between servers. Pretty nice!

Hope you found this quickstart guide useful for your own tests. Remember, one of the most important things: don’t overdo automation!  😉 These tools are good for checking health and performing the first initial failover. But you must still investigate what happened, why, fix it and work to avoid it from happening again. In high availability (HA) environments, automate everything and cause it to stop being HA.

Have fun!

Consul, ProxySQL and MySQL HASeptember 16, 2016In "MySQL"

Tips from the trenches for over-extended MySQL DBAsDecember 2, 2014In "Insight for DBAs"

ProxySQL and MHA IntegrationSeptember 13, 2016In "Insight for DBAs"

Get the Inside Scoop from Percona

Join 33,000+ fellow open-source enthusiasts! Our newsletter provides monthly updates on Percona open source software releases, technical resources, and valuable MySQL, MariaDB, PostgreSQL and MongoDB-related posts from the blog. Get information on Percona Live, our technical webinars, and upcoming events and meetups where you can talk with our experts.

Enter your email address: *

By submitting my information I agree that Percona may use my personal data in send communication to me about Percona services. I understand that I can unsubscribe from the communication at any time in accordance with the Percona Privacy Policy.

Author

Miguel Angel Nieto

Miguel joined Percona in October 2011. He has worked as a System Administrator for a Free Software consultant and in the supporting area of the biggest hosting company in Spain. His current focus is improving MySQL and helping the community of Free Software to grow.


Share this post

Facebook Twitter LinkedIn Email

Comments (6)

Simon J Mudd Reply

Hi Miguel Angel,

A nice write-up on MHA.

I looked at MHA some time ago and found it to work quite well. Setting up the required ssh and MySQL access was more of a nuisance especially if the number of servers grows, but the main thing that put a brake on MHA was the fact it could not fix broken intermediate masters. That is if the topology is only a master with directly connected slaves everything is ok. If you have a master connected to an intermediate master which has a slave MHA can not handle this, or that’s my recollection. Perhaps it has changed now. If you have servers spread over multiple datacentres then it is much more likely that intermediate masters may exist, and that’s also true if you migrate from one version of MySQL to another (major one) and can not take downtime. In that situation you’re likely to have a new intermediate master of the higher version running under the older master and this new master will have slaves of the newer version.

This lead me to look at Orchestrator which is able to resolve the issues mentioned and also only requires MySQL access to the servers to do its work. Percona have had a couple of blogs on Orchestrator so that’s covered elsewhere but I think this “limitation” I saw in MHA if it is still present should be mentioned.

September 3, 2016 at 5:21 am

    • Doug Wiggin Reply

      Simon, Thanks for the post. We are going to give Orchestrator a run in one of our environments seeking to replace MMM Monitor.

      September 5, 2016 at 4:37 pm

  • Miguel Angel Nieto Reply

    Thanks for the comment Simon.

    Multi-tier environments are not really supported:

    https://code.google.com/p/mysql-master-ha/wiki/Requirements#For_managing_three_or_more_tier_replication_environment..

    But master-master environments are. So, as usual in free software, you have to use the tool that helps you better 🙂

    September 6, 2016 at 4:45 am

  • Vinay Reply

    Hi Miguel Angel,

    Please help me to understand the limitation of MHA .

    February 14, 2017 at 8:27 am

  • Dylan Reply

    I configured it the way you want it, but I got an error when checking the replication:
    https://pastebin.ubuntu.com/p/8X3cDtFZ24/
    Where is it wrong? TAT

    January 16, 2019 at 9:19 am

  • Indubhushan Jha Reply

    Hello,
    Can we install MHA manager on all nodes? I know it is recommended to install it on SLAVE host only. But what is the side effect if we install it on all node (including Master node). This can help us to initiate the switchover from any node.

    January 30, 2019 at 4:52 am

Leave a Reply


How Can We Help?

Percona's experts can maximize your application performance with our open source database support, managed services or consulting.

Subscribe

Want to get weekly updates listing the latest blog posts? Subscribe now and we'll send you an update every Friday at 1pm ET.

Polls

How has the traffic and load on your database or application been trending over the last few weeks?

  • Significantly decreased
  • Decreased
  • No significant change
  • Increased
  • Significantly Increased

View Results

Categories

Upcoming Webinars

All Webinars »

点赞
收藏
评论区
推荐文章
浅梦一笑 浅梦一笑
4个月前
初学 Python 需要安装哪些软件?超级实用,小白必看!
编程这个东西是真的奇妙。对于懂得的人来说,会觉得这个工具是多么的好用、有趣,而对于小白来说,就如同大山一样。其实这个都可以理解,大家都是这样过来的。那么接下来就说一下python相关的东西吧,并说一下我对编程的理解。本人也是小白一名,如有不对的地方,还请各位大神指出01名词解释:如果在编程方面接触的比较少,那么对于软件这一块,有几个名词一定要了解,比如开发环
Jacquelyn38 Jacquelyn38
1年前
2020年前端实用代码段,为你的工作保驾护航
有空的时候,自己总结了几个代码段,在开发中也经常使用,谢谢。1、使用解构获取json数据let jsonData  id: 1,status: "OK",data: 'a', 'b';let  id, status, data: number   jsonData;console.log(id, status, number )
刚刚好 刚刚好
4个月前
css问题
1、在IOS中图片不显示(给图片加了圆角或者img没有父级)<div<imgsrc""/</divdiv{width:20px;height:20px;borderradius:20px;overflow:h
blmius blmius
1年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
晴空闲云 晴空闲云
4个月前
css中box-sizing解放盒子实际宽高计算
我们知道传统的盒子模型,如果增加内边距padding和边框border,那么会撑大整个盒子,造成盒子的宽度不好计算,在实务中特别不方便。boxsizing可以设置盒模型的方式,可以很好的设置固定宽高的盒模型。盒子宽高计算假如我们设置如下盒子:宽度和高度均为200px,那么这会这个盒子实际的宽高就都是200px。但是当我们设置这个盒子的边框和内间距的时候,那
艾木酱 艾木酱
3个月前
快速入门|使用MemFire Cloud构建React Native应用程序
MemFireCloud是一款提供云数据库,用户可以创建云数据库,并对数据库进行管理,还可以对数据库进行备份操作。它还提供后端即服务,用户可以在1分钟内新建一个应用,使用自动生成的API和SDK,访问云数据库、对象存储、用户认证与授权等功能,可专
Wesley13 Wesley13
1年前
MySQL查询按照指定规则排序
1.按照指定(单个)字段排序selectfromtable_nameorderiddesc;2.按照指定(多个)字段排序selectfromtable_nameorderiddesc,statusdesc;3.按照指定字段和规则排序selec
Stella981 Stella981
1年前
Angular material mat
IconIconNamematiconcode_add\_comment_addcommenticon<maticonadd\_comment</maticon_attach\_file_attachfileicon<maticonattach\_file</maticon_attach\
Wesley13 Wesley13
1年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
helloworld_28799839 helloworld_28799839
4个月前
常用知识整理
Javascript判断对象是否为空jsObject.keys(myObject).length0经常使用的三元运算我们经常遇到处理表格列状态字段如status的时候可以用到vue
helloworld_34035044 helloworld_34035044
6个月前
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为