HBase日志中报Slow ReadProcessor read fields

某环境中查看HBase日志中报Slow ReadProcessor read fields,查看相关解释说这个问题主要是由于hdfs引起的,因为hbase作为客户端向hdfs写入数据进行持久化,和hbase本身没有太大关系。至于是因为哪一部分的问题,可以用如下命令对datanode上的日志来分析一下:

egrep -o "Slow.*?(took|cost)" /path/to/current/datanode/log | sort | uniq -c

典型的输出为,

     23 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.10:Slow BlockReceiver write data to disk cost
     30 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.10:Slow BlockReceiver write packet to mirror took
     42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.10:Slow flushOrSync took
     63 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow BlockReceiver write data to disk cost
    273 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow BlockReceiver write packet to mirror took
     97 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow flushOrSync took
      3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow manageWriterOsCache took
      3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow PacketResponder send ack to upstream took
     11 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow BlockReceiver write data to disk cost
    232 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow BlockReceiver write packet to mirror took
     87 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow flushOrSync took
      3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow PacketResponder send ack to upstream took
    936 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow BlockReceiver write data to disk cost
 401432 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow BlockReceiver write packet to mirror took
     42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow flushOrSync took
      4 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow manageWriterOsCache took
      8 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow PacketResponder send ack to upstream took
    128 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow BlockReceiver write data to disk cost
  46404 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow BlockReceiver write packet to mirror took
     68 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow flushOrSync took
      4 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow manageWriterOsCache took
      9 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow PacketResponder send ack to upstream took
     70 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow BlockReceiver write data to disk cost
    143 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow BlockReceiver write packet to mirror took
     28 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow flushOrSync took
     12 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow manageWriterOsCache took
      4 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow PacketResponder send ack to upstream took
     92 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow BlockReceiver write data to disk cost
    187 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow BlockReceiver write packet to mirror took
    181 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow flushOrSync took
     15 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow manageWriterOsCache took
      3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow PacketResponder send ack to upstream took
     24 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.7:Slow BlockReceiver write data to disk cost
     61 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.7:Slow BlockReceiver write packet to mirror took
    102 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.7:Slow flushOrSync took
     14 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.8:Slow BlockReceiver write data to disk cost
     42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.8:Slow BlockReceiver write packet to mirror took
     74 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.8:Slow flushOrSync took
     19 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow BlockReceiver write data to disk cost
     42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow BlockReceiver write packet to mirror took
     65 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow flushOrSync took
      2 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow PacketResponder send ack to upstream took
     10 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out:Slow BlockReceiver write data to disk cost
    177 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out:Slow BlockReceiver write packet to mirror took
      2 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out:Slow flushOrSync took

Slow BlockReceiver write data to disk cost : 表明在将块写入OS缓存或磁盘时存在延迟
Slow BlockReceiver write packet to mirror took :表明在网络上写入块时有延迟
Slow manageWriterOsCache took : 表明在将块写入OS缓存或磁盘时存在延迟
Slow flushOrSync took : 表明在将块写入OS缓存或磁盘时存在延迟

一些分析方法

如果单个节点的一个或多个类别的”Slow“消息比其他主机的”Slow“消息数量多出数量级,则需要查看底层硬件是否存在问题。

如果Slow消息数最多的是Slow BlockReceiver write packet tomirror took,请通过以下命令的输出来调查可能的网络问题:

  1. ifconfig -a(定期检查问题主机上增加的errors和dropped的数量,往往代表的是网卡,网线或者上游的网络有问题)
  2. netstat -s(与正常节点相比,查找大量重新传输的数据包或其他异常高的指标)。
  3. netstat -s | grep -i retrans(整个集群执行)。 (在一个或多个节点上查找大于正常的计数)。

如果Slow消息最多的是一些其他消息,建议使用以下命令检查磁盘问题:

  1. iostat[高iowait百分比,超过15%]
  2. iostat -x和sar -d(特定分区的高await或%util)
  3. dmesg (磁盘错误)
    使用smartctl对磁盘进行健康检查:停止受影响节点的所有Hadoop进程,然后运行sudo smartctl -H /dev/,检查HDFS使用的每块
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

数据源的港湾

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值