基本配置记录
之前虽然看起来全集群配置好了,但在后续的学习测试中(例如PIG),发现还是会有各种报错。切换到伪集群模式正常,怀疑和全集群环境的配置有关系。今天重新折腾一番。
放弃之前从各个网络环境查到的资料(之前的配置文件其实是个“融合”版),到官网http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html查看相关资料,重新修订配置,官网提到的配置,70%都配置了,还有一些,本能认为现在没必要配置。总不能把http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml里边的内容都配进去吧。
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdpNameNode:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hdpuser/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hdpuser/dfs/data</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
</configuration>
mapred-site.xml
红色字体不配置,jobhistory服务不启动,会有相应的警告,这里没有截图记录,大概就是一直尝试连接,但连接不上的错误。
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hdpNameNode:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hdpNameNode:19888</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.acl.enable</name>
<value>false</value>
</property>
<property>
<name>yarn.admin.acl</name>
<value>*</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>false</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>hdpNameNode:18040</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hdpNameNode:18030</value>
</property>
<property>
<description>The address of the resource tracker interface.</description>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hdpNameNode:8025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hdpNameNode:8026</value>
</property>
<property>
<description>The address of the RM web application.</description>
<name>Yarn.resourcemanager.webapp.address</name>
<value>hdpNameNode:18088</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8196</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
测试验证
1、配置环境变量export HADOOP_CONF_DIR=$HADOOP_HOME/conf,之前一直使用默认的$HADOOP_HOME/etc/hadoop,现在环境多了,通过这个环境变量指定不同的配置文件。
2、格式化HDFS,hadoop namenode -format,并创建/user/hdpuser目录,
3、start-all.sh
4、启动jobhistory,mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
5、copy文件到HDFS,
hadoop fs -copyFromLocal /home/hdpuser/pig-0.12.0/tutorial/data/excite-small.log /user/hdpuser/excite-small.log
6、pig
7、grunt> log = LOAD '/user/hdpuser//excite-small.log' AS (user:chararray, time:long, query:chararray);
8、lmt = LIMIT log 4;
9、DUMP lmt;
10、结果正常
2013-12-08 14:45:59,790 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2013-12-08 14:45:59,793 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2013-12-08 14:45:59,793 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2013-12-08 14:45:59,801 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-12-08 14:45:59,801 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2A9EABFB35F5B954,970916105432,+md foods +proteins)
(BED75271605EBD0C,970916001949,yahoo chat)
(BED75271605EBD0C,970916001954,yahoo chat)
(BED75271605EBD0C,970916003523,yahoo chat)
2013-12-08 14:45:59,793 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2013-12-08 14:45:59,793 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2013-12-08 14:45:59,801 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-12-08 14:45:59,801 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2A9EABFB35F5B954,970916105432,+md foods +proteins)
(BED75271605EBD0C,970916001949,yahoo chat)
(BED75271605EBD0C,970916001954,yahoo chat)
(BED75271605EBD0C,970916003523,yahoo chat)
同样环境下,进行
WordCount测试,正常(先把要统计的测试文件copy到HDFS)
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.2.0-sources.jar org.apache.hadoop.examples.WordCount /user/hdpuser/input /user/hdpuser/output