1、 下载hadoop安装文件(hadoop-2.6.0-cdh5.5.0.tar.gz)并用二进制方式上传需要安装的主机。
2、 解压hadoop-2.6.0-cdh5.5.0.tar.gz文件。(tar zxvf hadoop-2.6.0-cdh5.5.0.tar.gz)
3、 在用户的环境变量文件中加入jar环境和hadoop的环境设置:(jar需要6及以上版本)
export JAVA_HOME=/usr/java/jdk1.7 export JRE_HOME=/usr/java/jdk1.7/jre export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH export HADOOP_HOME=/home/ekafka/hadoop-2.6.0-cdh5.5.0 PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$PATH:$HOME/bin export PATH |
4、 在$ HADOOP_HOME/etc/hadoop目录下修改hadoop-env.sh文件,添加jar环境:
export JAVA_HOME=/usr/java/jdk1.7 |
5、在$ HADOOP_HOME/etc/hadoop目录下修改core-site.xml文件:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://app1.ecs.top:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/ekafka/hadoop-2.6.0-cdh5.5.0/tmp</value> </property> <property> <name>fs.checkpoint.period</name> <value>3600</value> <description> The number of seconds between two periodic checkpoints. </description> </property> <property> <name>fs.checkpoint.size</name> <value>67108864</value> </property> <property> <name>hadoop.native.lib</name> <value>false</value> <description>Should native hadoop libraries, if present, be used.</description> </property> </configuration> |
6、在$ HADOOP_HOME/etc/hadoop目录下修改hdfs-site.xml文件:
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/ekafka/hadoop-2.6.0-cdh5.5.0/dfs/name</value> </property> <property> <name>dfs.namenode.data.dir</name> <value>/home/ekafka/hadoop-2.6.0-cdh5.5.0/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.http.address</name> <value>app1.ecs.top:50070</value> <description> The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>app2.ecs.top:50090</value> </property> </configuration> |
7、在$ HADOOP_HOME/etc/hadoop目录下修改mapred-site.xml文件:(拷贝文件mapred-site.xml.template为mapred-site.xml)
<configuration> <property> <name>mapred.job.tracker</name> <value>http://app1.ecs.top:9001</value> </property> </configuration> |
8、在$ HADOOP_HOME/etc/hadoop目录下修改yarn-site.xml文件:
<configuration> <property> <name>yarn.resourcemanager.address</name> <value>app1.ecs.top:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>app1.ecs.top:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>app1.ecs.top:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>app1.ecs.top:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>app1.ecs.top:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <!-- Site specific YARN configuration properties --> </configuration> |
9、在$ HADOOP_HOME/etc/hadoop目录下修改httpfs-site.xml文件:
<property> <name>dfs.http.address</name> <value>app1.ecs.top:50070</value> <description> The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port. </description> </property> |
10、在$ HADOOP_HOME/etc/hadoop目录下新建masters文件:(设置SecondaryNameNode主机)
app2.ecs.top |
11、在$ HADOOP_HOME/etc/hadoop目录下修改slaves文件:(设置DataNode主机)
app2.ecs.top app3.ecs.top |
12、以上就算是配置好了一台机器的hadoop,把整个hadoop目录拷贝到其它需要作为datanode的主机上去,使用命令:
scp -r /home/ekafka/hadoop-2.6.0-cdh5.5.0 ekafka@app2.ecs.top:/home/ekafka scp -r /home/ekafka/hadoop-2.6.0-cdh5.5.0 ekafka@app3.ecs.top:/home/ekafka |
13、其他注意的:如果三台机器都完成,需要修改每台机器的/etc/hosts,添加三台机器的IP和对应机器名:
10.1.236.85 app1.ecs.top 10.1.236.86 app2.ecs.top XXXX.XXXX.XXXX.XXXX app3.ecs.top |
14、剩下的就是将一台namenode机器与2台datanode机器进行相互的无密码访问:
一共3台机器,app1.ecs.top – app3.ecs.top |
15、启动hadoop,在$ HADOOP_HOME/sbin目录下执行start-all.sh脚本,启动namenode和datanode以及指定主机的SecondaryNameNode。
16、验证各主机是否已经启动了hadoop程序,使用jps命令查看:
[ekafka@app1 sbin]$ jps 18606 ResourceManager 18869 Jps 18334 NameNode |
[ekafka@app2 ~]$ jps 7415 DataNode 7692 Jps 7498 SecondaryNameNode 7565 NodeManager |
[ekafka@app3 ~]$ jps 7209 NodeManager 7336 Jps 7121 DataNode |
17、关于本地库的配置问题,如果在/home/ekafka/hadoop-2.6.0-cdh5.5.0/lib/native中自带了库文件的话,只需要在配置文件中增加环境变量:
export JAVA_LIBRARY_PATH=/home/ekafka/hadoop-2.6.0-cdh5.5.0/lib/native |
如果没有自带本地库文件的话,需要下载对应的本地库文件放入该目录,并添加环境变量。