近一段在研究Impala文档,在讲述Parquest table分区性能调优方面提到了Xceivers设置。因此将该参数英文资料整理翻译如下:
介绍
dfs.datanode.max.xcievers 参数对客户端有直接影响,他主要定义server端的线程数量,或者更详细说,数据连接的sockets。设置太小,当集群扩展时候,无法充分利用资源。以下部分帮助我们理解客户端和服务端工作机制,以及如何设置该参数大小。
问题:
当该参数设置太小,导致较少的资源提供给HBASE,意味着Server和client连接可能出现IOExceptions,如在RegionServer中出现的错误日志:
20xx-xx-xx 19:55:52,451 INFO org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 20xx-xx-xx 19:55:52,451 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block blk_-5467014108758633036_595771 20xx-xx-xx 19:55:58,455 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. 20xx-xx-xx 19:55:58,455 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block blk_-5467014108758633036_595771 bad datanode[0] 20xx-xx-xx 19:55:58,482 FATAL org.apache.hadoop.hbase.regionserver.Flusher: Replay of hlog required. Forcing server shutdown |
对应DataNode日志中也出现如下信息:
ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.10.10.53:50010,storageID=DS-1570581820-10.10.10.53-50010-1224117842339,infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 |
1、线程需要自己的堆栈,需要内存,缺省每个线程是1MB,换句话说,设置4096,需要4GB的内存来容纳他们,这影响了memstores,block cache和JVM,导致OutOfMemoryException错误。所以需要不能设置太大。
2、太多线程,将导致CPU负载过大,将引起许多上下文交换来处理并行工作,这将损耗实际运行的所需要的资源,所以需要合理的线程。
HDFS文件系统细节
客户端,HDFS库提供绝对调用路径,被HADOOP支持的文件系统类FileSystem class有一些implementation,其中一个DistributedFileSystem中DFSClient class,处理所有远程服务器交互。当客户端,如HBASE,打开一个文件,他调用FileSystem class中的open()或create()方法。
public DFSInputStream open(String src) throws IOException
public FSDataOutputStream create(Path f) throws IOException
返回的stream实例需要服务端的socket和线程,用于读写数据块数据,DFSOutputStream 或 DFSInputStream class处理所有NameNode交互,算出拷贝的数据块位置以及每个数据节点中每个数据块的数据通讯。
在服务端,数据节点 DataXceiverServer为实际的类,读取以上配置值,当超过上限阀值将抛异常。当数据节点启动,它创建一个线程组,启动所谈到的DataXceiverServer如下:
this.threadGroup = new ThreadGroup(“dataXceiverServer”);
this.dataXceiverServer = new Daemon(threadGroup,
new DataXceiverServer(ss, conf, this));
this.threadGroup.setDaemon(true); // auto destroy when empty
DataXceiverServer 线程也在线程组内, 数据节点也有内部的类提取该组中活动的线程:
/** Number of concurrent xceivers per node. */
int getXceiverCount() {
return threadGroup == null ? 0 : threadGroup.activeCount();
}
客户端连接启动读写数据块,在握手完毕期间,一个线程被创建,注册到以上的线程组,使得每个活动的新线程读写操作被跟踪到服务端,如果在线程组中的线程数量超过配置,在数据节点日志中也将抛出异常记录。
if (curXceiverCount > dataXceiverServer.maxXceiverCount) {
throw new IOException(“xceiverCount ” + curXceiverCount
+ ” exceeds the limit of concurrent xcievers “
+ dataXceiverServer.maxXceiverCount);
}
客户端处理
客户端如何通过服务端的线程读写操作,我们在DataXceiver类加debug信息
LOG.debug(“Number of active connections is: ” + datanode.getXceiverCount());
…
LOG.debug(datanode.dnRegistration + “:Number of active connections is: “ + datanode.getXceiverCount());
下面图标显示 RegionServer的状态:
最重要信息是storefile=22,HBASE有许多文件要处理,还包括write-ahead log,我们知道至少有22个活动连接。启动HBASE,检验DataNode和RegionServer日志信息:
命令行:
$ bin/start-hbase.sh
…
DataNode Log:
2012-03-05 13:01:35,309 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 1
2012-03-05 13:01:35,315 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 2
12/03/05 13:01:35 INFO regionserver.MemStoreFlusher: globalMemStoreLimit=396.7m, globalMemStoreLimitLowMark=347.1m, maxHeap=991.7m
12/03/05 13:01:39 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 60030
2012-03-05 13:01:40,003 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 1
12/03/05 13:01:40 INFO regionserver.HRegionServer: Received request to open region: -ROOT-,,0.70236052
2012-03-05 13:01:40,882 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 3
2012-03-05 13:01:40,884 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 4
2012-03-05 13:01:40,888 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 3
…
12/03/05 13:01:40 INFO regionserver.HRegion: Onlined -ROOT-,,0.70236052; next sequenceid=63083
2012-03-05 13:01:40,982 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 3
2012-03-05 13:01:40,983 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 4
…
12/03/05 13:01:41 INFO regionserver.HRegionServer: Received request to open region: .META.,,1.1028785192
2012-03-05 13:01:41,026 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 3
2012-03-05 13:01:41,027 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 4
…
12/03/05 13:01:41 INFO regionserver.HRegion: Onlined .META.,,1.1028785192; next sequenceid=63082
2012-03-05 13:01:41,109 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 3
2012-03-05 13:01:41,114 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 4
2012-03-05 13:01:41,117 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 5
12/03/05 13:01:41 INFO regionserver.HRegionServer: Received request to open 16 region(s)
12/03/05 13:01:41 INFO regionserver.HRegionServer: Received request to open region: usertable,,1330944810191.62a312d67981c86c42b6bc02e6ec7e3f.
12/03/05 13:01:41 INFO regionserver.HRegionServer: Received request to open region: usertable,user1120311784,1330944810191.90d287473fe223f0ddc137020efda25d.
…
2012-03-05 13:01:41,246 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 6
2012-03-05 13:01:41,248 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 7
…
2012-03-05 13:01:41,257 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 10
2012-03-05 13:01:41,257 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 9
…
12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,user1120311784,1330944810191.90d287473fe223f0ddc137020efda25d.; next sequenceid=62917
12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,,1330944810191.62a312d67981c86c42b6bc02e6ec7e3f.; next sequenceid=62916
…
12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,user1361265841,1330944811370.80663fcf291e3ce00080599964f406ba.; next sequenceid=62919
2012-03-05 13:01:41,474 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 6
2012-03-05 13:01:41,491 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 7
2012-03-05 13:01:41,495 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 8
2012-03-05 13:01:41,508 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 7
…
12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,user1964968041,1330944848231.dd89596e9129e1caa7e07f8a491c9734.; next sequenceid=62920
2012-03-05 13:01:41,618 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 6
2012-03-05 13:01:41,621 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 7
…
2012-03-05 13:01:41,829 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 7
12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,user515290649,1330944849739.d23924dc9e9d5891f332c337977af83d.; next sequenceid=62926
2012-03-05 13:01:41,832 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 6
2012-03-05 13:01:41,838 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 7
12/03/05 13:01:41 INFO regionserver.HRegion: Onlined usertable,user757669512,1330944850808.cd0d6f16d8ae9cf0c9277f5d6c6c6b9f.; next sequenceid=62929
…
2012-03-05 14:01:39,711 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 4
2012-03-05 22:48:41,945 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 4
12/03/05 22:48:41 INFO regionserver.HRegion: Onlined usertable,user757669512,1330944850808.cd0d6f16d8ae9cf0c9277f5d6c6c6b9f.; next sequenceid=62929
2012-03-05 22:48:41,963 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 4
从以上的日志可以看出,活动连接数从来没有达到22,仅仅达到10,为什么?为了明白这个,我们将看HDFS文件如何映射到服务端DataXceiver的实例。
Hadoop深层挖掘
DFSInputStream和DFSOutputStream是普通的stream内容,客户端-服务端是标准的Java接口,内部路由到所选的DataNode,拷贝当前一个数据块。根据需要打开和关闭连接,当客户端读HDFS文件,客户端库类透明从数据节点到数据节点切换,因此也是根据需要打开个关闭连接。
DFSInputStream有一个DFSClient.BlockReader类,打开到DataNode的连接。每次read()调用blockSeekTo(),打开连接,一旦完成读,连接关闭。DFSOutputStream有相似的类,跟踪连接到服务器,通过nextBlockOutputStream()启动。
以上两个读写数据块需要线程保留socket,依赖于客户端做什么,你将看到许多连接在当前HDFS文件数量访问数上下进行浮动。
回到上面HBASE:你没有看到22个连接的原因是连接所需要的数据是HFILE的数据块信息,该数据块能活动每个文件的重要信息,然后关闭连接,这意味着服务端的资源很快就释放。保留四个连接很难确定,可以使用JStack dump DataNode所有线程,显示如下:
“DataXceiver for client /127.0.0.1:64281 [sending block blk_5532741233443227208_4201]” daemon prio=5 tid=7fb96481d000 nid=0x1178b4000 runnable [1178b3000]
java.lang.Thread.State: RUNNABLE
…
“DataXceiver for client /127.0.0.1:64172 [receiving block blk_-2005512129579433420_4199 client=DFSClient_hb_rs_10.0.0.29,60020,1330984111693_1330984118810]” daemon prio=5 tid=7fb966109000 nid=0x1169cb000 runnable [1169ca000]
java.lang.Thread.State: RUNNABLE
…
这仅仅是DataXceiver记录,因此线程组的数量有点误导。DataXceiverServer 后台进程本身也被计算在内。和其他两个活动连接,实际上是三个活动线程。日志显示4个线程,实际上市一个活动的线程log计数将完成,所以实际上是三个,与启动时候3个线程匹配。
内部helper类,如PacketResponder占用另一个线程,JSTACK输出如下:
”PacketResponder 0 for Block blk_-2005512129579433420_4199″ daemon prio=5 tid=7fb96384d000 nid=0x116ace000 in Object.wait() [116acd000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder \
.lastDataNodeRun(BlockReceiver.java:779)
- locked (a org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:870)
at java.lang.Thread.run(Thread.java:680)
该线程状态目前是TIMED_WAITING,这就是为什么log中没有包含这些线程。如果客户端发送数据,活动线程数将立马上升,另一个注意地方是线程不需要另外的连接或者socket,PacketResponder仅仅是server端的一个线程,接收数据块数据,在写管道中,将数据发送到另一个DataNode。
Hadoop fsck命令,报告目前打开写的文件:
$ hadoop fsck /hbase -openforwrite
FSCK started by larsgeorge from /10.0.0.29 for path /hbase at Mon Mar 05 22:59:47 CET 2012
……/hbase/.logs/10.0.0.29,60020,1330984111693/10.0.0.29%3A60020.1330984118842 0 bytes, 1 block(s), OPENFORWRITE: ………………………………..Status: HEALTHY
Total size: 2088783626 B
Total dirs: 54
Total files: 45
…
他不会立马占用server端线程,通过block ID分配,打开数据块进行写。命令将显示实际文件盒Block ID:
$ hadoop fsck /hbase -files -blocks
FSCK started by larsgeorge from /10.0.0.29 for path /hbase at Tue Mar 06 10:39:50 CET 2012
…
/hbase/.META./1028785192/.tmp
/hbase/.META./1028785192/info
/hbase/.META./1028785192/info/4027596949915293355 36517 bytes, 1 block(s): OK
0. blk_5532741233443227208_4201 len=36517 repl=1
…
Status: HEALTHY
Total size: 2088788703 B
Total dirs: 54
Total files: 45 (Files currently being written: 1)
Total blocks (validated): 64 (avg. block size 32637323 B) (Total open file blocks (not validated): 1)
Minimally replicated blocks: 64 (100.0 %)
…
从输出可以看到两件事:
第一,在命令运行时,有一个文件打开,与-openforwriting匹配;
第二,block列表与文件线程名匹配,例如Block_id=“blk_5532741233443227208_4201″ 从server端发到客户端,该数据块属于HBASE .META表,在fsck命令中显示,结合JStack和fsck可以取代lsof命令来查看打开文件
JStack报告DataXceiver线程,在fsck命令没有出现该数据块列表。这是因为该数据块没有完成,所以没有提供到输出报告中,fsck仅仅报告完成数据块
回到HBASE
在server端打开所有region不需要很多资源,如果扫描整个HBASE表,强迫HBASE读所有HFILE数据块:
HBase Shell:
hbase(main):003:0> scan ‘usertable’
…
1000000 row(s) in 1460.3120 seconds
DataNode Log:
2012-03-05 14:42:20,580 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 6
2012-03-05 14:43:23,293 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 7
2012-03-05 14:43:23,299 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 8
…
2012-03-05 14:49:24,332 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 11
2012-03-05 14:49:24,332 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 10
2012-03-05 14:49:59,987 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 11
2012-03-05 14:51:12,603 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 12
2012-03-05 14:51:12,605 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 11
2012-03-05 14:51:46,473 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 12
…
2012-03-05 14:56:59,420 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 15
2012-03-05 14:57:31,722 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 16
2012-03-05 14:58:24,909 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 17
2012-03-05 14:58:24,910 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 16
…
2012-03-05 15:04:17,688 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 21
2012-03-05 15:04:17,689 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 22
2012-03-05 15:04:54,545 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 21
2012-03-05 15:05:55,901 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1423642448-10.0.0.64-50010-1321352233772, infoPort=50075, ipcPort=50020):Number of active connections is: 22
2012-03-05 15:05:55,901 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active connections is: 21
活动连接数就基本达到22.
所有意味着什么?
因此需要多少xcievers?如果使用HBase,仅仅简单监控storefiles+加上一些中间数量所需要使用比例+write ahead 日志文件。
以上例子输出是在单节点运行,如果是集群,需要将总的storefile除以DataNode,例如存储文件数量是1000,10个节点datanode,缺省256个xcievier线程数是OK。
最差情况是包含了所有读和写的数量,这很难事先确定,你也许很想建立十几个保留。因为写进程需要额外的,另外需要一些更短活动线程给PacketResponder,也不得不计算。因此合理的简单公式如下:
xceivers数量=(活动的写数量*2+活动的读数量)/Datanode数量
所以每个节点dfs.datanode.max.xcievers数量可以设置
对于纯的HBASE设置,我们评估该公式如下:
因为一些参数很难估算,我们用最大值来代替,替换公式如下: