hadoop安装配置及问题

废话少说,lets do it
一、安装配置

(1)下载安装包:
wget http://labs.renren.com/apache-mirror//hadoop/core/hadoop-1.0.0/hadoop-1.0.0.tar.gz 

(2)解压:tar xzvf hadoop-1.0.0.tar.gz 

(3)配置文件(集群)(datanode和namenode配置一致)
         1)conf/hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>


<property>
<name>dfs.name.dir</name>
<value>/localdisk/hadoop/hdfs/name</value>
<description>  </description>
</property>


<property>
<name>dfs.data.dir</name>
<value>/localdisk/hadoop/hdfs/data</value>
<description> </description>
</property>


<property>
  <name>dfs.replication</name>
  <value>2</value>
</property>


</configuration>


By default, HDFS doesn’t require DataNodes to have any reserved free space. In
practice, most systems have questionable stability when the amount of free space gets
too low. You should set dfs.datanode.du.reserved to reserve 1 GB of free space in a
DataNode. A DataNode will stop accepting block writes when its amount of free space
falls below the reserved amount.


      2)conf/mapred.xml
 
               <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>


<property>
<name>mapred.job.tracker</name>
<value>192.168.137.1:9001</value>
</property>


<property>
<name>mapred.child.java.opt</name>
<value>-Xmx2048m</value>
</property>


<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>24</value>
  <description>The maximum number of map tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>


<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>24</value>
  <description>The maximum number of reduce tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>


</configuration>

      3)conf/core-site.xml

             <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>


        <property>
                <name>fs.default.name</name>
                <value>hdfs://192.168.137.1:9100</value>
        </property>


</configuration>

4)conf/slaves
#localhost
#mpi001
mpi002
mpi003
mpi004
mpi005
mpi006
mpi007
mpi008
mpi009
mpi010
mpi011
mpi012
mpi013
mpi014
mpi015
mpi016
mpi017
mpi018
mpi019
mpi020
mpi021
mpi022
mpi023
mpi024
mpi025
mpi026
mpi027
mpi028
mpi029
mpi030
mpi031
mpi032

5)conf/env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre
export HADOOP_HOME=/opt/hadoop-1.0.0
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib

二、启动hadoop

1)hadoop namenode -format
2)start-all.sh




三、实例

1)准备好wordcount例子,使用fatjar打包成jar文件

2)准备原始文件
  a. 创建两个本地文本文件,
  b. 在hdfs上创建目录  hadoop fs -mkdir  wordcount/input, hadoop fs -mkdir  wordcount/output1(默认会创建output,为防止冲突叫output1)
  c. 将本地文件上传到hdfs上:
hadoop fs -put input1.txt wordcount/input
hadoop fs -put input2.txt wordcount/input
  d. 运行程序
hadoop jar /root/workspace/WordCount/WordCount_fat.jar wordcount/input wordcount/output1
  e. 查看结果
 hadoop fs -cat  wordcount/output1/part-00000


 

四、问题

(1)杀不掉问题
stop-all.sh 不能够杀掉计算节点上的hdfs进程,用一下命令:ps aux |grep hadoop|awk '{print $2}'|xargs kill -9,为了方便起见,做了一个脚本进行批杀

#!/bin/bash
#Node=( c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216)
#Node=(mpi017 mpi018 mpi019 mpi020 mpi021 mpi022 mpi023 mpi024 mpi025 mpi026 mpi027 mpi028 mpi029 mpi030 mpi031 mpi032)
Node=( mpi002 mpi003 mpi004 mpi005 mpi006 mpi007 mpi008 mpi009 mpi011 mpi012 mpi013 mpi014 mpi015 mpi016 mpi017 mpi018 mpi019 mpi020 mpi021 mpi022 mpi023 mpi024 mpi025 mpi026 mpi027 mpi028 mpi029 mpi030 mpi031 mpi032)
for((i=0;i<30;i++))
   do


ssh   ${Node[i]}  'ps aux |grep hadoop|awk "{print $2}"|xargs kill -9'


done


(2)配置冲突问题
我们在hdfs-site.xml和core-site.xml中都配置了“fs.default.name”,但端口号不一样,最终起作用的是hdfs-site中的value,namenode起来后可以先查看一下namenode机器上你设置的端口号是否起作用:比如:lsof -i:54301,再看datanode上的节点日志,确定是否访问的是namenode上你所指定的端口号?
tail -f  /opt/hadoop-1.0.0/logs/hadoop-root-datanode-mpi002.log
如果重新更改了配置文件,可能需要重新启动hadoop,先stop-all 再start-all

(3)Incompatible nameID

创建目录没有问题,一旦有实质性的文件上传,就会抛出错误:


2/12/06 21:14:12 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/testlog/input/apache_access.log could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1556)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)


查看数据节点的日志:

[root@mpi002 ~]# tail /opt/hadoop-1.0.0/logs/hadoop-root-datanode-mpi002.log 
2012-12-06 21:18:52,310 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /localdisk/hadoop/hdfs/data: namenode namespaceID = 1338088189; datanode namespaceID = 1917596014


这个问题重新犯了好几次,年龄大了记性不好,正解如下,就是删除掉所有datanode的数据目录下的内容,或者参考下面文章中的第二种方案

http://blog.csdn.net/wh62592855/article/details/5752199

为了方便删除,脚本如下(前提,所有节点无密码互通)

#!/bin/bash
#Node=( c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216)
#Node=(mpi017 mpi018 mpi019 mpi020 mpi021 mpi022 mpi023 mpi024 mpi025 mpi026 mpi027 mpi028 mpi029 mpi030 mpi031 mpi032)
Node=( mpi002 mpi003 mpi004 mpi005 mpi006 mpi007 mpi008 mpi009 mpi011 mpi012 mpi013 mpi014 mpi015 mpi016 mpi017 mpi018 mpi019 mpi020 mpi021 mpi022 mpi023 mpi024 mpi025 mpi026 mpi027 mpi028 mpi029 mpi030 mpi031 mpi032)
for((i=0;i<30;i++))
   do

ssh   ${Node[i]}  'rm -rf /localdisk/hadoop/hdfs/data
'


done

上述脚本中, /localdisk/hadoop/hdfs/data为数据节点的存储数据的目录,其配置文件为 vi  $HADOOP_HOME/conf/hdfs-site.xml

<property>
<name>dfs.data.dir</name>
<value>/localdisk/hadoop/hdfs/data</value>
<description> </description>
</property>



hadoop-env.sh: Bash script Environment variables that are used in the scripts to run Hadoop.

core-site.xml Hadoop configuration XML
: Configuration settings for Hadoop Core, such as I/O settings that are common to HDFS and MapReduce.

hdfs-site.xml Hadoop configuration XML
: Configuration settings for HDFS daemons: the namenode, the secondary namenode, and the datanodes.

mapred-site.xml Hadoop configuration XML
: Configuration settings for MapReduce daemons: the jobtracker, and the tasktrackers.

masters
Plain text A list of machines (one per line) that each run a secondary namenode.

slaves
Plain text A list of machines (one per line) that each run a datanode and a tasktracker


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值