在HIVE执行MR的时候,报如下错误
java.io.IOException: Call to server/10.64.49.21:9001 failed on local exception: java.io.IOException: Too many open files
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
at org.apache.hadoop.ipc.Client.call(Client.java:1033)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1011)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1023)
at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:285)
at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:685)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:458)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:191)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:629)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:617)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Too many open files
at sun.nio.ch.IOUtil.initPipe(Native Method)
at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407)
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:343)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)
文件数打开过多,造成这种情况的原因一般来说是由于应用程序对资源使用不当造成,比如没有及时关闭Socket或数据库连接等。但也可能应用确实需要打开比较多的文件句柄,而系统本身的设置限制了这一数量。
1.ulimit -a 查看linux系统打开文件的最大句柄数,如下
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 16308
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 16308
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
open files (-n) 1024 默认为1024
2.ps ef|grep hadoop 查看hadoop进程,记录每个进程ID
3.查看每个进程的文件操作情况 lsof -p ID |wc -l
修改默认的文件句柄数
修改/etc/security/limits.conf文件
在文件末尾加入下内容
* soft nofile 65536
* hard nofile 65536
重启shell生效。
在Debian下,重启root用户的shell后看不到文件句柄数的改变,其实在其他用户下此设置已经生效。所以要对root用户生效还需加上以下两句。
root soft nofile 65536
root hard nofile 65536
否则在root用户下看不到参数生效。在centos系统下就不必加这两句。
另外修改系统环境变量的时候一般为了不重启系统,自然的就会想source下这个配置文件。此时在Debian下source会提示如下错误
-bash: iptab.sh: command not found
-bash: iptab.sh: command not found
而且在不同目录下红色部分都不一样,而在centos下可以正常执行该命令。
至于出现该问题的原因应该与内核相关,不是很理解,暂时没找到原因。有知道原因的请拍砖,谢谢。