MySQL “Got an error reading communication packet”

Wesley13
• 阅读 491

导读

作者:Muhammad Irfan

原文:

https://www.percona.com/blog/2016/05/16/mysql-got-an-error-reading-communication-packet-errors/

翻译:徐晨亮

本文涉及链接已在文末展示,可复制到浏览器打开


In this blog post, we’ll discuss the possible reasons for MySQL “Got an error reading communication packet” errors and how to address them.

在这篇文章中,我们将会讨论MySQL"Got an error reading communication packet"错误的原因以及如何定位它们。

In Percona’s managed services, we often receive customer questions on communication failure errors. So let’s discuss possible reasons for this error and how to remedy it.

在Percona的托管服务中,我们经常收到关于通信失败错误的客户咨询。因此让我们一起来讨论下该错误可能的原因以及如何来规避。

MySQL Communication Errors

First of all, whenever a communication error occurs, it increments the status counter for either Aborted_clients or Aborted_connects, which describe the number of connections that were aborted because the client died without closing the connection properly and the number of failed attempts to connect to MySQL server (respectively). The possible reasons for both errors are numerous (see the Aborted_clients increments or Aborted_connects increments sections in the MySQL manual).

首先,不管什么时候发生通信错误,状态计数器Aborted clients或者Aborted connects会增加,表示由于客户端未正确关闭连接而断开的次数,以及连接到MySQL服务器失败的尝试次数。这两个错误的可能原因有很多(参见MySQL 手册“参考链接一”中的Aborted_clients incrementsaborted_connections increments部分)

In the case of log_warnings, MySQL also writes this information to the error log (shown below):

在log_warnings>1的情况下,MySQL同样将该信息写入到error log(如下所示):

[Warning] Aborted connection 305628 to db: 'db' user: 'dbuser' host: 'hostname' (Got an error reading communication packets)[Warning] Aborted connection 305627 to db: 'db' user: 'dbuser' host: 'hostname' (Got an error reading communication packets)

In this case, MySQL increments the status counter for Aborted_clients, which could mean:

  • The client connected successfully but terminated improperly (and may relate to not closing the connection properly)

  • The client slept for longer than the defined wait_timeout or interactive_timeout seconds (which ends up causing the connection to sleep for wait_timeout seconds and then the connection gets forcibly closed by the MySQL server)

  • The client terminated abnormally or exceeded the max_allowed_packet for queries

The above is not an all-inclusive list. Now, let’s identify what is causing this problem and how to remedy it.

以下情况下, MySQL会增加Aborted_clients状态变量的计数器,也就意味着:

  • 客户端已经成功连接,但是异常终止了(可能与未正确关闭连接有关)

  • 客户端sleep时间超过了变量wait_timeout或 interactive_timeout定义的秒数(最终导致连接休眠的时间超过系统变量wait_timeout的值,然后被MySQL强行关闭)

  • 客户端异常中断或查询超出了 max_allowed_packet值

以上是一个非全部包含的原因列表。现在让我们找出造成该问题的原因并且如何规避。

Fixing MySQL Communication Errors

To be honest, aborted connection errors are not easy to diagnose. But in my experience, it’s related to network/firewall issues most of the time. We usually investigate those issues with the help of Percona toolkit scripts, i.e. pt-summary / pt-mysql-summary / pt-stalk. The outputs from those scripts can be very helpful.

说实话,aborted connections错误并不容易诊断。但是在我的经验中,大多数时间是网络或者防火墙的原因。我们通常使用Percona toolkit脚本,例如 pt-summary / pt-mysql-summary / pt-stalk来调查这些问题。这些脚本的输出非常有用。

Some of the reasons for aborted connection errors can be:

  • A high rate of connections sleeping inside MySQL for hundred of seconds is one of the symptoms that applications aren’t closing connections after doing work, and instead relying on the wait_timeout to close them. I strongly recommend changing the application logic to properly close connections at the end of an operation.

  • Check to make sure the value of max_allowed_packet is high enough, and that your clients are not receiving a “packet too large” message. This situation aborts the connection without properly closing it.

  • Another possibility is TIME_WAIT. I’ve noticed many TIME_WAIT notifications from the netstat, so I would recommend confirming the connections are well managed to close on the application side.

  • Make sure the transactions are committed (begin and commit) properly so that once the application is “done” with the connection it is left in a clean state.

  • You should ensure that client applications do not abort connections. For example, if PHP has option max_execution_time set to 5 seconds, increasing connect_timeout would not help because PHP will kill the script. Other programming languages and environments can have similar safety options.

  • Another cause for delay in connections is DNS problems. Check if you have skip-name-resolve enabled and if hosts are authenticated against their IP address instead of their hostname.

  • One way to find out where your application is misbehaving is to add some logging to your code that will save the application actions along with the MySQL connection ID. With that, you can correlate it to the connection number from the error lines. Enable the Audit log plugin, which logs connections and query activity, and check the Percona Audit Log Plugin as soon as you hit a connection abort error. You can check for the audit log to identify which query is the culprit. If you can’t use the Audit plugin for some reason, you can consider using the MySQL general log – however, this can be risky on a loaded server. You should enable the general log for at least a few minutes. While it puts a heavy burden on the server, errors happen fairly often so you should be able to collect the data before the log grows too large. I recommend enabling the general log with an -f tail, then disable the general log when you see the next warning in the log. Once you find the query from the aborted connection, identify which piece of your application issues that query and co-relate the queries with portions of your application.

  • Try increasing the net_read_timeout and net_write_timeout values for MySQL and see if that reduces the number of errors. net_read_timeout is rarely the problem unless you have an extremely poor network. Try tweaking those values, however, because in most cases a query is generated and sent as a single packet to the server, and applications can’t switch to doing something else while leaving the server with a partially received query. There is a very detailed blog post on this topic from our CEO, Peter Zaitsev.

以下是造成aborted connection错误的可能原因:

  • 在MySQL内部,MySQL内部处于休眠了几百秒的状态的连接中很大比例是应用程序在完成工作后没有关闭连接造成的,而是依靠wait_tiemout系统变量来关闭连接。我强烈建议修改应用程序逻辑,在操作结束后正确关闭连接

  • 检查以确保max_allowed_packet足够大,你的客户端不会收到"packet too large"的消息。这种情况下的连接断开属于由于没有正确关闭连接

  • 另外一种可能是TIME_WAIT。我曾经多次从netstat注意到TIME_WAIT提示,所以我建议在应用端确认正确关闭连接

  • 确保事务提交(begin和commit)都正确提交以保证一旦应用程序完成以后留下的连接是处于干净的状态

  • 你应该确保客户端程序不会断开连接。例如,如果PHP设置了max_execution_time为5秒,增加connect_timeout并不会起到作用,因为PHP会kill脚本。其他程序语言和环境也有类似的安全选项

  • 连接延迟的另外一个原因是DNS问题。检查参数skip-name-resolve是否打开,以及是否根据主机的IP地址而不是主机名对主机进行身份验证

  • 发现你的应用程序故障的一种办法是添加一些日志到你的代码中来保存包含连接ID的应用程序行为。有了它,你能够将连接数字与错误行数对应起来了。打开审计日志插件,日志记录了连接和查询操作,一旦触发到了连接断开的错误,你都应该检查Percona审计日志。你可以通过检查审计日志找出哪个查询是根本原因。如果由于某些原因你不能使用审计日志,你可以考虑使用MySQL的常规日志-然而对于高负载的服务器来说这样是有风险的。再不济,你可以打开常规日志几分钟。打开常规日志会给服务器增加巨大负担,并且经常会发生错误,因此你应该在日志增长太大之前就收集完数据。我建议打开常规日志并使用`tail -f`,然后当你在日志中看到下一个警告时关闭。一旦从断开的连接中找到查询,请确定查询的应用程序问题的哪一部分,并将查询与应用程序的某些部分关联起来。

  • 尝试增加MySQL的net_read_timeoutnet_write_timeout的参数值然后观察是否减少错误数。net_read_timeout一般很少出问题,除非你的网络真的很糟糕。但是,尝试调整这些值,因为在大多数情况下,生成一个查询并将其作为一个包发送到服务器,而应用程序不能在将部分接收到的查询留给服务器的同时去做其他事情

Aborted connections happen because a connection was not closed properly. The server can’t cause aborted connections unless there is a networking problem between the server and the client (like the server is half duplex, and the client is full duplex) – but that is the network causing the problem, not the server. In any case, such problems should show up as errors on the networking interface. To be extra sure, check the ifconfig -a output on the MySQL server to check if there are errors.

发生连接断开的原因是因为连接没有正确关闭。服务器并不会造成连接断开,除非服务器和客户端之间有网络问题(例如服务器是单工而客户端是双工的)-但是这是网络造成的问题,而不是服务器。在任何情况下,这些问题都应该在网络接口上显示为错误。另外,请检查MySQL服务器上的ifconfig -a输出是否有错误。

Another way to troubleshoot this problem is via tcpdump. You can refer to this blog post on how to track down the source of aborted connections. Look for potential network issues, timeouts and resource issues with MySQL.

另外一种定位该问题的方法是通过tcpdump。你可以参考这篇文章how to track down the source of aborted connections(参考链接二)找到MySQL的潜在网络、超时和资源问题。

I found this blog post useful in explaining how to use tcpdump on busy hosts. It provides help for tracking down the TCP exchange sequence that led to the aborted connection, which can help you figure out why the connection broke.

我发现了这篇(参考链接三)关于解释如何在一台繁忙的机器上使用tcpdump的文章非常有用。它提供了跟踪导致断开连接的TCP交换序列的帮助,能够帮你找出连接中断的原因。

For network issues, use a ping to calculate the round trip time (RTT) between a machine where mysqld is located and the machine from where the application makes requests. Send a large file (1GB or more) to and from client and server machines, watch the process using tcpdump, then check if an error occurred during transfer. Repeat this test a few times. I also found this from my colleague Marco Tusa useful: Effective way to check network connection.

对于网络问题,使用ping来计算从发起请求的应用服务器到mysqld服务器间的往返时间(RTT)。从客户端发送一个大文件(1GB或者更大)到服务端,使用tcpdump观察进程,并检查传输期间是否有错误发生。重复测试数次。我还发现我同事Marco Tusa的文章也非常有用Effective way to check network connection(参考链接四)

One other idea I can think of is to capture the netstat -s output along with a timestamp after every N seconds (e.g., 10 seconds so you can relate netstat -s output of BEFORE and AFTER an aborted connection error from the MySQL error log). With the aborted connection error timestamp, you can co-relate it with the netstat sample captured as per a timestamp of netstat, and watch which error counters increased under the TcpExt section of netstat -s.

我能想到的另外一种思路是每隔N秒抓取netstat -s加上时间戳的输出(例如隔10秒,你可以将netstat -s的输出与MySQL的错误日志中的连接断开错误前后联系起来)。通过断开连接错误时间戳,你可以将它与捕捉到的带时间戳的netstat示例关联起来,并观察在netstat -s的TcpExt部分中哪些错误计数器增加了。

Along with that, you should also check the network infrastructure sitting between the client and the server for proxies, load balancers, and firewalls that could be causing a problem.

与此同时,你还应该检查客户机和服务器之间的网络基础设施,以查找可能导致问题的代理、负载均衡和防火墙。

Conclusion:

In addition to diagnosing communication failure errors, you also need to take into account faulty ethernets, hubs, switches, cables, and so forth which can cause this issue as well. You must replace the hardware itself to properly diagnose these issues.

结论:

除了诊断通信故障错误之外,还需要考虑网卡、hub、交换机、网线等因为这些都有可能导致故障。必须更换硬件才能正确诊断这些问题。

文中涉及到的参考链接:

参考链接一:http://1t.click/Ymw

参考链接二: http://1t.click/Ymy

参考链接三: http://1t.click/Ym2

参考链接四: http://1t.click/Ym4

END


MySQL “Got an error reading communication packet”

MySQL “Got an error reading communication packet”

MySQL “Got an error reading communication packet”

扫码加入MySQL技术Q群

(群号:****793818397)

MySQL “Got an error reading communication packet”

本文分享自微信公众号 - 老叶茶馆(iMySQL_WX)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

点赞
收藏
评论区
推荐文章
blmius blmius
2年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
Jacquelyn38 Jacquelyn38
3年前
2020年前端实用代码段,为你的工作保驾护航
有空的时候,自己总结了几个代码段,在开发中也经常使用,谢谢。1、使用解构获取json数据let jsonData  id: 1,status: "OK",data: 'a', 'b';let  id, status, data: number   jsonData;console.log(id, status, number )
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为
Wesley13 Wesley13
2年前
mysql设置时区
mysql设置时区mysql\_query("SETtime\_zone'8:00'")ordie('时区设置失败,请联系管理员!');中国在东8区所以加8方法二:selectcount(user\_id)asdevice,CONVERT\_TZ(FROM\_UNIXTIME(reg\_time),'08:00','0
Wesley13 Wesley13
2年前
00:Java简单了解
浅谈Java之概述Java是SUN(StanfordUniversityNetwork),斯坦福大学网络公司)1995年推出的一门高级编程语言。Java是一种面向Internet的编程语言。随着Java技术在web方面的不断成熟,已经成为Web应用程序的首选开发语言。Java是简单易学,完全面向对象,安全可靠,与平台无关的编程语言。
Stella981 Stella981
2年前
Android蓝牙连接汽车OBD设备
//设备连接public class BluetoothConnect implements Runnable {    private static final UUID CONNECT_UUID  UUID.fromString("0000110100001000800000805F9B34FB");
Stella981 Stella981
2年前
Django中Admin中的一些参数配置
设置在列表中显示的字段,id为django模型默认的主键list_display('id','name','sex','profession','email','qq','phone','status','create_time')设置在列表可编辑字段list_editable
Wesley13 Wesley13
2年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
为什么mysql不推荐使用雪花ID作为主键
作者:毛辰飞背景在mysql中设计表的时候,mysql官方推荐不要使用uuid或者不连续不重复的雪花id(long形且唯一),而是推荐连续自增的主键id,官方的推荐是auto_increment,那么为什么不建议采用uuid,使用uuid究
Python进阶者 Python进阶者
5个月前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这