Hadoop 2.x和1.x已经大不相同了,应该说对于存储计算都更加通用了。Hadoop 2.x实现了用来管理集群资源的YARN框架,可以面向任何需要使用基于HDFS存储来计算的需要,当然MapReduce现在已经作为外围的插件式的计算框架,你可以根据需要开发或者选择合适的计算框架。目前,貌似对MapReduce支持还是比较好的,毕竟MapReduce框架已经还算成熟。其他一些基于YARN框架的标准也在开发中。
YARN框架的核心是资源的管理和分配调度,它比Hadoop 1.x中的资源分配的粒度更细了,也更加灵活了,它的前景应该不错。由于极大地灵活性,所以在使用过程中由于这些配置的灵活性,可能使用的难度也加大了一些。另外,我个人觉得,YARN毕竟还在发展之中,也有很多不成熟的地方,各种问题频频出现,资料也相对较少,官方文档有时更新也不是很及时,如果我选择做海量数据处理,可能YARN还不能满足生产环境的需要。如果完全使用MapReduce来做计算,还是选择相对更加成熟的Hadoop 1.x版本用于生产环境。
下面使用4台机器,操作系统为CentOS 6.4 64位,一台做主节点,另外三台做从节点,实践集群的安装配置。
主机配置规划
修改/etc/hosts文件,增加如下地址映射:
每台机器配置对应的hostname,修改/etc/sysconfig/network文件,例如s1节点内容配置为:
m1为集群主节点,s1、s2、s3为集群从节点。
关于主机资源的配置,我们这里面使用VMWare工具,创建了4个虚拟机,具体置情况如下所示:
- 一个主节点有1个核(core)
- 一个主节点内存1G
- 每个从节点有1个核(core)
- 每个从节点内存2G
目录规划
Hadoop程序存放目录为/home/shirdrn/cloud/programs/hadoop-2.2.0,相关的数据目录,包括日志、存储等指定为/home/shirdrn/cloud/storage/hadoop-2.2.0。将程序和数据目录分开,可以更加方便的进行配置的同步。
具体目录的准备与配置如下所示:
- 在每个节点上创建程序存储目录/home/shirdrn/cloud/programs/hadoop-2.2.0,用来存放Hadoop程序文件
- 在每个节点上创建数据存储目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs,用来存放集群数据
- 在主节点m1上创建目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/name,用来存放文件系统元数据
- 在每个从节点上创建目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/data,用来存放真正的数据
- 所有节点上的日志目录为/home/shirdrn/cloud/storage/hadoop-2.2.0/logs
- 所有节点上的临时目录为/home/shirdrn/cloud/storage/hadoop-2.2.0/tmp
下面配置涉及到的目录,都参照这里的目录规划。
环境变量配置
首先,使用Sun的JDK,修改~/.bashrc文件,配置如下:
1 | export JAVA_HOME=/usr/java/jdk1.6.0_45/ |
2 | export PATH=$PATH:$JAVA_HOME/bin |
3 | export CLASSPATH=$JAVA_HOME/lib/*.jar:$JAVA_HOME/jre/lib/*.jar |
然后配置Hadoop安装目录,相关环境变量:
1 | export HADOOP_HOME=/home/shirdrn/cloud/programs/hadoop-2.2.0 |
2 | export PATH=$PATH:$HADOOP_HOME/bin |
3 | export PATH=$PATH:$HADOOP_HOME/sbin |
4 | export HADOOP_LOG_DIR=/home/shirdrn/cloud/storage/hadoop-2.2.0/logs |
5 | export YARN_LOG_DIR=$HADOOP_LOG_DIR |
免密码登录配置
在每各节点上,执行如下命令:
然后点击回车一直下去即可。
在主节点m1上,执行命令:
保证不需要密码即可登录本机m1节点。
将m1的公钥,添加到s1、s2、s3的~/.ssh/authorized_keys文件中,并且需要查看~/.ssh/authorized_keys的权限,不能对同组用户具有写权限,如果有,则执行下面命令:
1 | chmod g-w ~/. ssh /authorized_keys |
这时,在m1节点上,应该保证执行如下命令不需要输入密码:
Hadoop配置文件
配置文件所在目录为/home/shirdrn/programs/hadoop-2.2.0/etc/hadoop,可以修改对应的配置文件。
01 | <? xml version = "1.0" encoding = "UTF-8" ?> |
02 | <? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
06 | < name >fs.defaultFS</ name > |
08 | < description >The name of the default file system. A URI whose scheme |
09 | and authority determine the FileSystem implementation. The uri's |
10 | scheme determines the config property (fs.SCHEME.impl) naming the |
11 | FileSystem implementation class. The uri's authority is used to |
12 | determine the host, port, etc. for a filesystem.</ description > |
15 | < name >dfs.replication</ name > |
19 | < name >hadoop.tmp.dir</ name > |
20 | < value >/home/shirdrn/cloud/storage/hadoop-2.2.0/tmp/hadoop-${user.name}</ value > |
21 | < description >A base for other temporary directories.</ description > |
01 | <? xml version = "1.0" encoding = "UTF-8" ?> |
02 | <? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
06 | < name >dfs.namenode.name.dir</ name > |
07 | < value >/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/name</ value > |
08 | < description >Path on the local filesystem where the NameNode stores |
09 | the namespace and transactions logs persistently.</ description > |
12 | < name >dfs.datanode.data.dir</ name > |
13 | < value >/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/data</ value > |
14 | < description >Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</ description > |
17 | < name >dfs.permissions</ name > |
05 | < name >yarn.resourcemanager.resource-tracker.address</ name > |
06 | < value >m1:8031</ value > |
07 | < description >host is the hostname of the resource manager and |
08 | port is the port on which the NodeManagers contact the Resource Manager. |
12 | < name >yarn.resourcemanager.scheduler.address</ name > |
13 | < value >m1:8030</ value > |
14 | < description >host is the hostname of the resourcemanager and port is |
16 | on which the Applications in the cluster talk to the Resource Manager. |
20 | < name >yarn.resourcemanager.scheduler.class</ name > |
21 | < value >org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</ value > |
22 | < description >In case you do not want to use the default scheduler</ description > |
25 | < name >yarn.resourcemanager.address</ name > |
26 | < value >m1:8032</ value > |
27 | < description >the host is the hostname of the ResourceManager and the |
29 | which the clients can talk to the Resource Manager. |
33 | < name >yarn.nodemanager.local-dirs</ name > |
34 | < value >${hadoop.tmp.dir}/nodemanager/local</ value > |
35 | < description >the local directories used by the nodemanager</ description > |
38 | < name >yarn.nodemanager.address</ name > |
39 | < value >0.0.0.0:8034</ value > |
40 | < description >the nodemanagers bind to this port</ description > |
43 | < name >yarn.nodemanager.resource.cpu-vcores</ name > |
45 | < description ></ description > |
48 | < name >yarn.nodemanager.resource.memory-mb</ name > |
50 | < description >Defines total available resources on the NodeManager to be made available to running containers</ description > |
53 | < name >yarn.nodemanager.remote-app-log-dir</ name > |
54 | < value >${hadoop.tmp.dir}/nodemanager/remote</ value > |
55 | < description >directory on hdfs where the application logs are moved to </ description > |
58 | < name >yarn.nodemanager.log-dirs</ name > |
59 | < value >${hadoop.tmp.dir}/nodemanager/logs</ value > |
60 | < description >the directories used by Nodemanagers as log directories</ description > |
63 | < name >yarn.application.classpath</ name > |
64 | < value >$HADOOP_HOME,$HADOOP_HOME/share/hadoop/common/*, |
65 | $HADOOP_HOME/share/hadoop/common/lib/*, |
66 | $HADOOP_HOME/share/hadoop/hdfs/*,$HADOOP_HOME/share/hadoop/hdfs/lib/*, |
67 | $HADOOP_HOME/share/hadoop/yarn/*,$HADOOP_HOME/share/hadoop/yarn/lib/*, |
68 | $HADOOP_HOME/share/hadoop/mapreduce/*,$HADOOP_HOME/share/hadoop/mapreduce/lib/*</ value > |
69 | < description >Classpath for typical applications.</ description > |
<!-- 下面一项是我自己的配置(可根据情况更改)
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
</value>
</property>
73 | < name >yarn.nodemanager.aux-services</ name > |
74 | < value >mapreduce_shuffle</ value > |
75 | < description >shuffle service that needs to be set for Map Reduce to run </ description > |
78 | < name >yarn.nodemanager.aux-services.mapreduce.shuffle.class</ name > |
79 | < value >org.apache.hadoop.mapred.ShuffleHandler</ value > |
82 | < name >yarn.scheduler.minimum-allocation-mb</ name > |
86 | < name >yarn.scheduler.maximum-allocation-mb</ name > |
90 | < name >yarn.scheduler.minimum-allocation-vcores</ name > |
94 | < name >yarn.scheduler.maximum-allocation-vcores</ name > |
02 | <? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
06 | < name >mapreduce.framework.name</ name > |
08 | < description >Execution framework set to Hadoop YARN.</ description > |
11 | < name >mapreduce.map.memory.mb</ name > |
13 | < description >Larger resource limit for maps. default 1024M</ description > |
16 | < name >mapreduce.map.cpu.vcores</ name > |
18 | < description ></ description > |
21 | < name >mapreduce.reduce.memory.mb</ name > |
23 | < description >Larger resource limit for reduces.</ description > |
26 | < name >mapreduce.reduce.shuffle.parallelcopies</ name > |
28 | < description >Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.</ description > |
31 | < name >mapreduce.jobhistory.address</ name > |
32 | < value >m1:10020</ value > |
33 | < description >MapReduce JobHistory Server host:port, default port is 10020.</ description > |
36 | < name >mapreduce.jobhistory.webapp.address</ name > |
37 | < value >m1:19888</ value > |
38 | < description >MapReduce JobHistory Server Web UI host:port, default port is 19888.</ description > |
- 配置hadoop-env.sh、yarn-env.sh、mapred-env.sh脚本文件
修改每个脚本文件的JAVA_HOME变量即可,如下所示:
1 | export JAVA_HOME=/usr/java/jdk1.6.0_45/ |
同步分发程序文件
在主节点m1上将上面配置好的程序文件,复制分发到各个从节点上:
1 | scp -r /home/shirdrn/cloud/programs/hadoop-2.2.0 shirdrn@s1:/home/shirdrn/cloud/programs/ |
2 | scp -r /home/shirdrn/cloud/programs/hadoop-2.2.0 shirdrn@s2:/home/shirdrn/cloud/programs/ |
3 | scp -r /home/shirdrn/cloud/programs/hadoop-2.2.0 shirdrn@s3:/home/shirdrn/cloud/programs/ |
启动HDFS集群
经过上面配置以后,可以启动HDFS集群。
为了保证集群启动过程中不会出现问题,需要手动关闭每个节点上的防火墙,执行如下命令:
1 | sudo service iptables stop |
或者永久关闭防火墙:
1 | sudo chkconfig iptables off |
2 | sudo chkconfig ip6tables off |
在主节点m1上,首先进行文件系统格式化操作,执行如下命令:
1 | hadoop namenode - format |
然后,可以启动HDFS集群,执行如下命令:
可以查看启动日志,确认HDFS集群启动是否成功:
1 | tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-namenode-m1.log |
2 | tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-secondarynamenode-m1.log |
3 | tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-datanode-s1.log |
4 | tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-datanode-s2.log |
5 | tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-datanode-s3.log |
或者,查看对应的进程情况:
可以通过登录Web控制台,查看HDFS集群状态,访问如下地址:
启动YARN集群
在主节点m1上,执行如下命令:
可以查看启动日志,确认YARN集群启动是否成功:
1 | tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-resourcemanager-m1.log |
2 | tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-nodemanager-s1.log |
3 | tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-nodemanager-s2.log |
4 | tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-nodemanager-s3.log |
或者,查看对应的进程情况:
另外,ResourceManager运行在主节点m1上,可以Web控制台查看状态:
NodeManager运行在从节点上,可以通过Web控制台查看对应节点的资源状态,例如节点s1:
管理JobHistory Server
启动可以JobHistory Server,能够通过Web控制台查看集群计算的任务的信息,执行如下命令:
1 | mr-jobhistory-daemon.sh start historyserver |
默认使用19888端口。
通过访问http://m1:19888/查看任务执行历史信息。
终止JobHistory Server,执行如下命令:
1 | mr-jobhistory-daemon.sh stop historyserver |
集群验证
我们使用Hadoop自带的WordCount例子进行验证。
先在HDFS创建几个数据目录:
1 | hadoop fs - mkdir -p /data/wordcount |
2 | hadoop fs - mkdir -p /output/ |
目录/data/wordcount用来存放Hadoop自带的WordCount例子的数据文件,运行这个MapReduce任务的结果输出到/output/wordcount目录中。
将本地文件上传到HDFS中:
1 | hadoop fs -put /home/shirdrn/cloud/programs/hadoop-2.2.0/etc/hadoop/*.xml /data/wordcount/ |
可以查看上传后的文件情况,执行如下命令:
1 | hadoop fs - ls /data/wordcount |
可以看到上传到HDFS中的文件。
下面,运行WordCount例子,执行如下命令:
1 | hadoop jar /home/shirdrn/cloud/programs/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount |
可以看到控制台输出程序运行的信息:
01 | [shirdrn@m1 hadoop-2.2.0]$ hadoop jar /home/shirdrn/cloud/programs/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount |
02 | 13/12/25 22:38:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
03 | 13/12/25 22:38:03 INFO client.RMProxy: Connecting to ResourceManager at m1/10.95.3.48:8032 |
04 | 13/12/25 22:38:04 INFO input.FileInputFormat: Total input paths to process : 7 |
05 | 13/12/25 22:38:04 INFO mapreduce.JobSubmitter: number of splits:7 |
06 | 13/12/25 22:38:04 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name |
07 | 13/12/25 22:38:04 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar |
08 | 13/12/25 22:38:04 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class |
09 | 13/12/25 22:38:04 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class |
10 | 13/12/25 22:38:04 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class |
11 | 13/12/25 22:38:04 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name |
12 | 13/12/25 22:38:04 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class |
13 | 13/12/25 22:38:04 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir |
14 | 13/12/25 22:38:04 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir |
15 | 13/12/25 22:38:04 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps |
16 | 13/12/25 22:38:04 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class |
17 | 13/12/25 22:38:04 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir |
18 | 13/12/25 22:38:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1388039619930_0002 |
19 | 13/12/25 22:38:05 INFO impl.YarnClientImpl: Submitted application application_1388039619930_0002 to ResourceManager at m1/10.95.3.48:8032 |
21 | 13/12/25 22:38:05 INFO mapreduce.Job: Running job: job_1388039619930_0002 |
22 | 13/12/25 22:38:14 INFO mapreduce.Job: Job job_1388039619930_0002 running in uber mode : false |
23 | 13/12/25 22:38:14 INFO mapreduce.Job: map 0% reduce 0% |
24 | 13/12/25 22:38:22 INFO mapreduce.Job: map 14% reduce 0% |
25 | 13/12/25 22:38:42 INFO mapreduce.Job: map 29% reduce 5% |
26 | 13/12/25 22:38:43 INFO mapreduce.Job: map 43% reduce 5% |
27 | 13/12/25 22:38:45 INFO mapreduce.Job: map 43% reduce 14% |
28 | 13/12/25 22:38:54 INFO mapreduce.Job: map 57% reduce 14% |
29 | 13/12/25 22:38:55 INFO mapreduce.Job: map 71% reduce 19% |
30 | 13/12/25 22:38:56 INFO mapreduce.Job: map 100% reduce 19% |
31 | 13/12/25 22:38:57 INFO mapreduce.Job: map 100% reduce 100% |
32 | 13/12/25 22:38:58 INFO mapreduce.Job: Job job_1388039619930_0002 completed successfully |
33 | 13/12/25 22:38:58 INFO mapreduce.Job: Counters: 44 |
35 | FILE: Number of bytes read=15339 |
36 | FILE: Number of bytes written=667303 |
37 | FILE: Number of read operations=0 |
38 | FILE: Number of large read operations=0 |
39 | FILE: Number of write operations=0 |
40 | HDFS: Number of bytes read=21904 |
41 | HDFS: Number of bytes written=9717 |
42 | HDFS: Number of read operations=24 |
43 | HDFS: Number of large read operations=0 |
44 | HDFS: Number of write operations=2 |
48 | Launched reduce tasks=1 |
49 | Data-local map tasks=9 |
50 | Total time spent by all maps in occupied slots (ms)=457338 |
51 | Total time spent by all reduces in occupied slots (ms)=65832 |
54 | Map output records=1923 |
55 | Map output bytes=26222 |
56 | Map output materialized bytes=15375 |
58 | Combine input records=1923 |
59 | Combine output records=770 |
60 | Reduce input groups=511 |
61 | Reduce shuffle bytes=15375 |
62 | Reduce input records=770 |
63 | Reduce output records=511 |
68 | GC time elapsed (ms)=3951 |
69 | CPU time spent (ms)=22610 |
70 | Physical memory (bytes) snapshot=1598832640 |
71 | Virtual memory (bytes) snapshot=6564274176 |
72 | Total committed heap usage (bytes)=971993088 |
80 | File Input Format Counters |
82 | File Output Format Counters |
查看结果,执行如下命令:
1 | hadoop fs - cat /output/wordcount/part-r-00000 | head |
结果数据示例如下:
01 | [shirdrn@m1 hadoop-2.2.0]$ hadoop fs - cat /output/wordcount/part-r-00000 | head |
02 | 13/12/25 22:58:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin -java classes where applicable |
07 | $HADOOP_HOME/share/hadoop/common/lib/*, 1 |
08 | $HADOOP_HOME/share/hadoop/hdfs/*,$HADOOP_HOME/share/hadoop/hdfs/lib/*, 1 |
09 | $HADOOP_HOME/share/hadoop/mapreduce/*,$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value> 1 |
10 | $HADOOP_HOME/share/hadoop/yarn/*,$HADOOP_HOME/share/hadoop/yarn/lib/*, 1 |
13 | cat : Unable to write to output stream. |
登录到Web控制台,访问链接http://m1:8088/可以看到任务记录情况。
可见,我们的HDFS能够存储数据,而YARN集群也能够运行MapReduce任务。
问题及总结
在Hadoop 2.2.0中,YARN框架有很多默认的参数值,如果你是在机器资源比较不足的情况下,需要修改这些默认值,来满足一些任务需要。
NodeManager和ResourceManager都是在yarn-site.xml文件中配置的,而运行MapReduce任务时,是在mapred-site.xml中进行配置的。
下面看一下相关的参数及其默认值情况:
参数名称 | 默认值 | 进程名称 | 配置文件 | 含义说明 |
yarn.nodemanager.resource.memory-mb | 8192 | NodeManager | yarn-site.xml | 从节点所在物理主机的可用物理内存总量 |
yarn.nodemanager.resource.cpu-vcores | 8 | NodeManager | yarn-site.xml | 节点所在物理主机的可用虚拟CPU资源总数(core) |
yarn.nodemanager.vmem-pmem-ratio | 2.1 | NodeManager | yarn-site.xml | 使用1M物理内存,最多可以使用的虚拟内存数量 |
yarn.scheduler.minimum-allocation-mb | 1024 | ResourceManager | yarn-site.xml | 一次申请分配内存资源的最小数量 |
yarn.scheduler.maximum-allocation-mb | 8192 | ResourceManager | yarn-site.xml | 一次申请分配内存资源的最大数量 |
yarn.scheduler.minimum-allocation-vcores | 1 | ResourceManager | yarn-site.xml | 一次申请分配虚拟CPU资源最小数量 |
yarn.scheduler.maximum-allocation-vcores | 8 | ResourceManager | yarn-site.xml | 一次申请分配虚拟CPU资源最大数量 |
mapreduce.framework.name | local | MapReduce | mapred-site.xml | 取值local、classic或yarn其中之一,如果不是yarn,则不会使用YARN集群来实现资源的分配 |
mapreduce.map.memory.mb | 1024 | MapReduce | mapred-site.xml | 每个MapReduce作业的map任务可以申请的内存资源数量 |
mapreduce.map.cpu.vcores | 1 | MapReduce | mapred-site.xml | 每个MapReduce作业的map任务可以申请的虚拟CPU资源的数量 |
mapreduce.reduce.memory.mb | 1024 | MapReduce | mapred-site.xml | 每个MapReduce作业的reduce任务可以申请的内存资源数量 |
yarn.nodemanager.resource.cpu-vcores | 8 | MapReduce | mapred-site.xml | 每个MapReduce作业的reduce任务可以申请的虚拟CPU资源的数量 |
- 异常java.io.IOException: Bad connect ack with firstBadLink as 10.95.3.66:50010
详细异常信息,如下所示:
01 | [shirdrn@m1 hadoop-2.2.0]$ hadoop fs -put /home/shirdrn/cloud/programs/hadoop-2.2.0/etc/hadoop/*.xml /data/wordcount/ |
02 | 13/12/25 21:29:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
03 | 13/12/25 21:29:46 INFO hdfs.DFSClient: Exception in createBlockOutputStream |
04 | java.io.IOException: Bad connect ack with firstBadLink as 10.95.3.66:50010 |
05 | at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1166) |
06 | at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) |
07 | at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) |
08 | 13/12/25 21:29:46 INFO hdfs.DFSClient: Abandoning BP-1906424073-10.95.3.48-1388035628061:blk_1073741825_1001 |
09 | 13/12/25 21:29:46 INFO hdfs.DFSClient: Excluding datanode 10.95.3.66:50010 |
10 | 13/12/25 21:29:46 INFO hdfs.DFSClient: Exception in createBlockOutputStream |
11 | java.io.IOException: Bad connect ack with firstBadLink as 10.95.3.59:50010 |
12 | at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1166) |
13 | at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) |
14 | at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) |
15 | 13/12/25 21:29:46 INFO hdfs.DFSClient: Abandoning BP-1906424073-10.95.3.48-1388035628061:blk_1073741826_1002 |
16 | 13/12/25 21:29:46 INFO hdfs.DFSClient: Excluding datanode 10.95.3.59:50010 |
17 | 13/12/25 21:29:46 INFO hdfs.DFSClient: Exception in createBlockOutputStream |
18 | java.net.NoRouteToHostException: No route to host |
19 | at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) |
20 | at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) |
21 | at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) |
22 | at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) |
23 | at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305) |
24 | at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128) |
25 | at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) |
26 | at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) |
27 | 13/12/25 21:29:46 INFO hdfs.DFSClient: Abandoning BP-1906424073-10.95.3.48-1388035628061:blk_1073741828_1004 |
28 | 13/12/25 21:29:46 INFO hdfs.DFSClient: Excluding datanode 10.95.3.59:50010 |
29 | 13/12/25 21:29:46 INFO hdfs.DFSClient: Exception in createBlockOutputStream |
主要是由于Hadoop集群内某些节点的防火墙没有关闭,导致无法访问集群内节点。
参考链接
.