看到一篇关于Note日志的文章,挺不错的,转载记录一下。其中在第三种错误日志中增加了一些知识点。
在使用MySQL 5.7的时候,发现了不少在MySQL 5.6上不曾见过的日志,级别为Note,作者梳理了一下,最常见的Note日志以下三种,下面我们来逐个解释。
第一种:Aborted connection
[Note] Aborted connection 59 to db: 'unconnected' user: 'mha' host: '197.xx.xx.xx' (Got an error reading communication packets)
[Note] Aborted connection 61 to db: 'unconnected' user: 'mha' host: '197.xx.xx.xx(Got an error reading communication packets)
[Note] Aborted connection 61 to db: 'unconnected' user: 'mha' host: '197.xx.xx.xx(Got an error reading communication packets)
[Note] Aborted connection 61 to db: 'unconnected' user: 'mha' host: '197.xx.xx.xx(Got timeout reading communication packets)
1
2
3
4
[Note]Abortedconnection59todb:'unconnected'user:'mha'host:'197.xx.xx.xx'(Gotanerrorreadingcommunicationpackets)
[Note]Abortedconnection61todb:'unconnected'user:'mha'host:'197.xx.xx.xx(Gotanerrorreadingcommunicationpackets)
[Note]Abortedconnection61todb:'unconnected'user:'mha'host:'197.xx.xx.xx(Gotanerrorreadingcommunicationpackets)
[Note]Abortedconnection61todb:'unconnected'user:'mha'host:'197.xx.xx.xx(Gottimeoutreadingcommunicationpackets)
上面两个是连接中断的错误信息,原因不一样,Got an error reading communication packets 的原因是因为网络等原因导致。 Got timeout reading communication packets 这个错误的原因是会话的idle时间达到了数据库指定的timeout时间。
第二种:Slave多线程同步信息
[Note]Multi-threaded slave statistics for channel '': seconds elapsed = 121; eventsassigned = 100374529; worker queues filled over overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1451875661700 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000
[Note]Multi-threaded slave statistics for channel '': seconds elapsed = 122; eventsassigned = 100500481; worker queues filled over overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1452001865500 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000
1
2
[Note]Multi-threadedslavestatisticsforchannel'':secondselapsed=121;eventsassigned=100374529;workerqueuesfilledoveroverrunlevel=0;waiteddueaWorkerqueuefull=0;waitedduethetotalsize=0;waitedatclockconflicts=1451875661700waited(count)whenWorkersoccupied=3211993waitedwhenWorkersoccupied=445032386000
[Note]Multi-threadedslavestatisticsforchannel'':secondselapsed=122;eventsassigned=100500481;workerqueuesfilledoveroverrunlevel=0;waiteddueaWorkerqueuefull=0;waitedduethetotalsize=0;waitedatclockconflicts=1452001865500waited(count)whenWorkersoccupied=3211993waitedwhenWorkersoccupied=445032386000
我们通过源代码,找到下面一段,该段实现了上述日志的输出。
if ((my_now - rli->mts_last_online_stat)>=
mts_online_stat_period)
{
sql_print_information("Multi-threadedslave statistics%s: "
"seconds elapsed = %lu; "
"events assigned = %llu; "
"worker queues filled over overrun level = %lu;"
"waited due a Worker queue full = %lu; "
"waited due the total size = %lu; "
"waited at clock conflicts = %llu "
"waited(count) when Workers occupied = %lu "
"waited when Workers occupied = %llu",
rli->get_for_channel_str(),
static_cast
(my_now - rli->mts_last_online_stat),
rli->mts_events_assigned,
rli->mts_wq_overrun_cnt,
rli->mts_wq_overfill_cnt,
rli->wq_size_waits_cnt,
rli->mts_total_wait_overlap,
rli->mts_wq_no_underrun_cnt,
rli->mts_total_wait_worker_avail);
rli->mts_last_online_stat=my_now;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
if((my_now-rli->mts_last_online_stat)>=
mts_online_stat_period)
{
sql_print_information("Multi-threadedslave statistics%s: "
"seconds elapsed = %lu; "
"events assigned = %llu; "
"worker queues filled over overrun level = %lu;"
"waited due a Worker queue full = %lu; "
"waited due the total size = %lu; "
"waited at clock conflicts = %llu "
"waited(count) when Workers occupied = %lu "
"waited when Workers occupied = %llu",
rli->get_for_channel_str(),
static_cast
(my_now-rli->mts_last_online_stat),
rli->mts_events_assigned,
rli->mts_wq_overrun_cnt,
rli->mts_wq_overfill_cnt,
rli->wq_size_waits_cnt,
rli->mts_total_wait_overlap,
rli->mts_wq_no_underrun_cnt,
rli->mts_total_wait_worker_avail);
rli->mts_last_online_stat=my_now;
通过这一句(my_now – rli->mts_last_online_stat), 以及最后一句rli->mts_last_online_stat=my_now; 可以得知, seconds elapsed 就是上一次统计跟这一次统计的时间间隔。而mts_online_stat_period =120秒,硬代码,因此就是几乎每隔120秒,就有上述日志的输出。 通过进一步查看原代码,初步了解上述日志信息的含义,如下:
events assigned:总共有多少个event被分配执行,计的是总数。
worker queues filled over overrun level:多线程同步中,worker 的私有队列长度超长的次数,计的是总数。
waited due a Worker queue full :因为worker的队列超长而产生等待的次数,计的是总数。
waited due the total size :超过最大size的次数,这个由参数slave_pending_jobs_size_max 指定。
waited at clock conflicts :因为逻辑时间产生冲突的等待时间,单位是纳秒。
waited (count) when Workers occupied :因为workder被占用而出现等待的次数。(总计值)。
waited when Workersoccupied :因为workder被占用而出现等待的总时间,总计值,单位是纳秒。
第三种:page_cleaner线程的输出日志
[Note] InnoDB:page_cleaner: 1000ms intended loop took 4750ms. The settings might not beoptimal. (flushed=1519 and evicted=0, during the time.)
1
[Note]InnoDB:page_cleaner:1000msintendedlooptook4750ms.Thesettingsmightnotbeoptimal.(flushed=1519andevicted=0,duringthetime.)
查找源代码,发现是上面的日志由下面一段代码输出:
if (curr_time >next_loop_time + 3000) {
if (warn_count == 0) {
ib::info() << "page_cleaner: 1000ms"
" intended loop took "
<<1000 + curr_time
- next_loop_time
<
" be optimal. (flushed="
<
<
<
<
if (warn_interval >300) {
warn_interval= 600;
}else {
warn_interval*= 2;
}
warn_count= warn_interval;
} else {
--warn_count;
}
} else {
/* reset counter */
warn_interval= 1;
warn_count= 0;
}
next_loop_time= curr_time + 1000;
n_flushed_last= n_evicted = 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
if(curr_time>next_loop_time+3000){
if(warn_count==0){
ib::info()<
" intended loop took "
<<1000+curr_time
-next_loop_time
<
" be optimal. (flushed="
<
<
<
<
if(warn_interval>300){
warn_interval=600;
}else{
warn_interval*=2;
}
warn_count=warn_interval;
}else{
--warn_count;
}
}else{
/*resetcounter*/
warn_interval=1;
warn_count=0;
}
next_loop_time=curr_time+1000;
n_flushed_last=n_evicted=0;
}
通过分析源代码, 输出上述日志的条件是curr_time> next_loop_time + 3000 ,即比原定的循环时间next_loop_time多3000毫秒,而next_loop_time的标准时间是1000毫秒,即1秒钟做一次刷新页的操作。也就是说本次刷新时间大于4s(3000 + 1000)才会有这个告警。其loop took 4750ms ,即是这次刷新循环的实际经历时间。
这里说明你的某些参数配置没有达到最优,也可以理解当前IO能力不足,应该可以选择性优化参数。
– innodb_io_capacity 适当降低
– innodb_io_capacity_max 适当降低
– innodb_max_dirty_pages_pct 适当降低
– innodb_max_dirty_pages_pct_lwm 如果设置了可以适当降低
降低的目的在于减少每次刷新脏页的量,增加刷新的次数,从而避免page_clean线程爆发的刷新脏页,从而堵塞IO通道。如果调整后还是不行,则考虑IO确实扛不住了。以上参数和每次刷新的块数紧密相关,大概原理就是MySQL会通过参数计算IO能力,然后结合以往每秒刷新的页的数量和现有脏页及空闲页的比例来进行脏数据刷新,具体可以看InnoDB Buffer Pool部分。
后面还有一个(flushed=1519 and evicted=0,during the time.)这样的日志,即对应n_flushed_last与n_evicted 变量,而这两个变量又由n_flushed_list与n_flushed_lru赋值。
./storage/innobase/buf/buf0flu.cc:3322: n_flushed_last +=n_flushed_list;
./storage/innobase/buf/buf0flu.cc:3321: n_evicted += n_flushed_lru;
1
2
./storage/innobase/buf/buf0flu.cc:3322:n_flushed_last+=n_flushed_list;
./storage/innobase/buf/buf0flu.cc:3321:n_evicted+=n_flushed_lru;
而n_flushed_list与n_flushed_lru赋值函数为pc_wait_finished,如下,我们来看看该函数的注释。
pc_wait_finished(&n_flushed_lru,&n_flushed_list)
/**
Wait until all flush requests are finished.
@param n_flushed_lru numberof pages flushed from the end of the LRU list.
@param n_flushed_list numberof pages flushed from the end of the
flush_list.
@return trueif all flush_list flushing batch were success. */
static
bool
pc_wait_finished(
ulint* n_flushed_lru,
ulint* n_flushed_list)
{
........
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
Waituntilallflushrequestsarefinished.
@paramn_flushed_lrunumberofpagesflushedfromtheendoftheLRUlist.
@paramn_flushed_listnumberofpagesflushedfromtheendofthe
flush_list.
@returntrueifallflush_listflushingbatchweresuccess.*/
static
bool
pc_wait_finished(
ulint*n_flushed_lru,
ulint*n_flushed_list)
{
........
}
通过源代码的注释,我们获知如下信息:
n_flushed_lru:这个值表示从lru列表尾部刷新的页数,也就是日志中如evicted=0指标的所表示的值,如果该值不为零,则存在innodb buffer不够的嫌疑。
n_flushed_list:这个是从刷新列表中刷新的页数,也就是脏页数,也就是日志中flushed=1519的值。
其实在MySQL 5.6 page_cleaner也同会有执行时间大于1s的,由于其并没有日志输出,所以我们没有看见。而MySQL 5.7增加了一个新的日志参数,重新定义了日志输出级别,默认会输出错误、警告和Note信息,而page_cleaner的输出就属于note信息。如果不想看见,可以调整log_error_verbosity参数值为2,就跟MySQL 5.6一样。
如果您觉得本站对你有帮助,那么可以支付宝扫码捐助以帮助本站更好地发展,在此谢过。