CentOS上安装hadoop集群
一、 环境
四台服务器,一台做主节点namenode,三台从节点datanode1,datanode2,datanode3
二、安装centos6.5(在四台机器做同样的操作)
1、配置固定ip
2、配置主机名vi/etc/sysconfig/network
(分别为namenode,datanode1,datanode2,datanode3)
三、配置路由器端口映射(同一个局域网不需要配置此项)
(ssh登录端口,vnc远程桌面端口)
重启VNC服务就用下面的命令:
servicevncserver restart
设置VNC服务开机启动用下面的命令:
chkconfigvncserver on
四、安装jdk(在四台机器做同样的操作)
1、搜索JDK 开发环境: yumsearch jdk
2、配置Java环境变量
#set java environment
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-0.b17.el6_7.x86_64
export JRE_HOME=$JAVA_HOME/jre
export PATH=$PATH:$JAVA_HOME/bin
exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
3、在线激活环境变量
chmod +x /etc/profile
source /etc/profile
4、查看Jdk是否安装成功
五、配置ssh无密钥登录
1、配置hosts文件
vi /etc/hosts //root配置
scp/etc/hosts root@datanode1:/etc/hosts
scp/etc/hosts root@datanode2:/etc/hosts
scp/etc/hosts root@datanode3:/etc/hosts
2、分别在namenode、datanode1、datanode2,使用hadoop用户登录
ssh-keygen -t rsa //中途采用默认值(三次回车)
3、把公钥全部发送到一台机器上(登录到各datanode把公钥发送到namenode)
scp/home/hadoop/.ssh/id_rsa.pub hadoop@namenode:/home/hadoop/.ssh/id_rsa.pub.datanode1
4、在namenode登录hadoop用户
cd/home/hadoop/.ssh/
catid_rsa.pub >> authorized_keys
catid_rsa.pub.datanode1 >> authorized_keys
catid_rsa.pub.datanode2 >> authorized_keys
catid_rsa.pub.datanode3 >> authorized_keys
chmod644 ~/.ssh/authorized_keys
5、分发到各台机器
scp~/.ssh/authorized_keys hadoop@datanode1:/home/hadoop/.ssh/authorized_keys
scp~/.ssh/authorized_keys hadoop@datanode2:/home/hadoop/.ssh/authorized_keys
scp~/.ssh/authorized_keys hadoop@datanode3:/home/hadoop/.ssh/authorized_keys
6、验证无密钥登录
六、安装hadoop
1、上传hadoop安装包,通过mobaxterm远程终端软件上传(wget下载太慢了)。也可以通过挂载u盘上传。
2、建立hdfs相关目录
mkdir -p/home/hadoop/hd_space/tmp
mkdir -p/home/hadoop/hd_space/hdfs/name
mkdir -p/home/hadoop/hd_space/hdfs/data
mkdir -p/home/hadoop/hd_space/mapred/local
mkdir -p/home/hadoop/hd_space/mapred/system
chown -Rhadoop:hadoop /home/hadoop/hd_space/
chown -Rhadoop:hadoop /usr/local/hadoop-2.5.2
3、配置hadoop环境变量
export HADOOP_HOME=/usr/local/hadoop-2.5.2
exportHADOOP_CONF_DIR=/usr/local/hadoop-2.5.2/etc/hadoop
exportYARN_CONF_DIR=/usr/local/hadoop-2.5.2/etc/hadoop
exportPATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
4、解压到/usr/local/下
5、在"/usr/local/hadoop-2.5.2/etc/hadoop/“下
配置hadoop-env.sh,yarn-env.sh.添加java环境变量
exportJAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-0.b17.el6_7.x86_64
编辑core-site.xml,添加
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hd_space/tmp</value>
</property>
</configuration>
编辑hdfs-site.xml,添加
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hd_space/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hd_space/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>datanode1:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>datanode1:50091</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.web.ugi</name>
<value>jack,supergroup</value>
</property>
</configuration>
编辑mapred-site.xml,添加
<configuration>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/home/hadoop/hd_space/mapred/local</value>
</property>
<property>
<name>mapreduce.cluster.system.dir</name>
<value>/home/hadoop/hd_space/mapred/system</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>namenode:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>namenode:19888</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>namenode:9001</value>
</property>
</configuration>
编辑yarn-site.xml,添加
<configuration>
<!-- Site specific YARN configurationproperties -->
<property>
<description>The hostname of theRM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<description>the valid service nameshould only contain a-zA-Z0-9_ and can not start withnumbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<description>The minimum allocation forevery container request at the RM</description>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<description>The maximum allocation forevery container request at the RM</description>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>16384</value>
</property>
<property>
<description>Amount of physical memory,in MB, that can be allocated for containers</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>32768</value>
</property>
</configuration>
修改slaves文件
6、复制到各台机器上
scp -r /usr/local/hadoop-2.5.2 datanode1:/usr/local/
。。。。。。
七、启动hadoop
1、使用hadoop用户登录namenode 首次运行需要格式化hdfs
hdfs namenode –format
2、启动dfs :start-dfs.sh
3、启动yarn资源管理器:start-yarn.sh
4、jps命令查看进程
表明启动成功
5、查看集群节点、任务信息8088端口
6、查看HDFS文件系统信息50070端口
八、测试hadoop
1、建立输入文件
cd/home/hadoop/test
echo"My first hadoop example. Hello Hadoop in input. " > testfile.txt
2、建立目录 hadoop fs-mkdir /test
3、上传文件 hadoop fs-put testfile.txt /test
4、执行wordcount程序
hadoopjar /usr/local/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jarwordcount /test/testfile.txt /test/output
5、查看结果
hadoop fs -cat /test/output/part-r-00000