Centos中安装配置local/standalone模式和伪分布式模式hadoop集群
1. 查看并验证/etc/hosts和/etc/hostname文件
Cat /etc/hostname
Cat /etc/hosts
新建linux用户hadoop,该用户专门用来操作hadoop集群
su
Useradd hadoop
Passwd hadoop
Su hadoop为hadoop用户配置ssh免密码登录
Ssh-keygen –t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh localhost安装java
Java –version可见我们这里安装了openJDK,我们切换为使用oracleJDK
下载安装新版JDK:
解压:
tar xzf jdk-8u131-linux-x64.tar.gz
为了使jdk对所有用户可用,需要移动到/usr/local/:
Su
mv jdk1.8.0_131 /usr/local/
配置环境变量~/.bashrc:
Source ~/.bashrc
Java –version验证:http://mirrors.hust.edu.cn/apache/tomcat/tomcat-7/v7.0.82/bin/apache-tomcat-7.0.82.tar.gz
- 下载并解压hadoop:
tar xzf hadoop-2.7.3.tar.gz
mv hadoop-2.7.3 /usr/local/hadoo
Local模式/standalone模式hadoop的安装
hadoop下载完毕后,默认情况下就是该模式,可以以一个java进程来运行程序。
只需要在~/.bashrc中新增hadoop_home环境变量即可
export HADOOP_HOME=/usr/local/hadoop
source ~/.bashrc测试local模式下的Hadoop集群
hadoop version
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar以hadoop自带的wordcount为例来演示Local模式hadoop集群的使用:
mkdir input
cp HADOOPHOME/∗.txtinputhadoopjar HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount input output
cat ./output/*安装伪分布式hadoop集群,需要修改~/.bashrc, HADOOPHOME/etc/hadoop/hadoop−env.sh,和 HADOOP_HOME/etc/hadoop/下的core-site.xml,hdfs-site.xml,yarn-site.xml,mapred-site.xml:
Vim ~/.bashrc
Source ~/.bashrc
cd $HADOOP_HOME/etc/hadoop
vim hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_131
vim core-site.xml
vim hdfs-site.xml
vim yarn-site.xml
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
Vim hdfs-site.xml:
Vim yarn-site.xml
Vim mapred-site.xml
格式化伪分布式的hadoop集群的namenode
hdfs namenode –format启动并验证hdfs集群
Start-dfs.sh
jps
浏览器访问Localhost:50070查看hdfs集群状态:
- 启动并验证yarn:
浏览器访问localhost:8088来查看yarn集群的状态
- 以hadoop自带的wordcount为例来演示伪分布式模式hadoop集群的使用:
Hdfs dfs -mkdir /user/hadoop/input
Hdfs dfs –put $HADOOP_HOME/*.txt /user/hadoop/input
Hdfs dfs –ls /user/hadoop/input
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount input output
Hdfs dfs -cat output/*
可以通过浏览器访问yarn集群,查看程序运行状态: