开篇:需要提前准备的软件
一、Linux环境配置
1、修改linux机器名称
#修改主机名称 关系到后面配置文件的设置
sudo vi /etc/hostname
#修改成master, 后面配置完毕后,将虚拟机复制多份,分别修改成slave1 , slave2 ……
#修改虚拟机器IP,保证每次IP保证不变,
sudo vi /etc/hosts
127.0.0.1 localhost
192.168.213.130 master #这个是自己的本机IP
二、JDK环境的基本安装和配置
说明:JDK通常从官网下载即可,需要做好版本匹配,需要从官网下载适合你电脑的JDK环境,本次JDK版本:jdk-8u161-linux-x64.tar.gz
具体操作如下
#解压命令
tar -xzvf jdk-8u161-linux-x64.tar.gz
#移动到指定地点,后面很依赖这个路径
sudo mv jdk1.8.0_161/ /usr/local/java/
#环境配置
sudo vi /etc/profile
#文件末尾添加
export JAVA_HOME=/usr/local/java/jdk1.8.0_161
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
#保存退出
#生效配置文件
source /etc/profile ##注意使用的用户,source 使本用户环境有效~~
#测试
java -version
#成功输出结果
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
三、ssh无密码登陆配置
1、确保已安装ssh服务,若无,执行
sudo apt-get install openssh-server
然后一路 yes即可
2、启动ssh服务
sudo ps -e |grep ssh
已成功启动结果
3、ssh生成密钥一般有rsa和dsa两种生成方式,通常采用rsa
ssh-keygen -t rsa -P "" # -P 是密码参数,置空
#运行完毕后,在~/.ssh/ 目录下生成私钥和公钥文件id_rsa和id_rsa.pub
#进入~/.ssh/ 目录,将公钥写入认证key
cd ~/.ssh/
cat id_rsa.pub >> authorized_keys
# 验证是否成功,ssh登陆本机
ssh localhost
#退出
exit
本地测试:ssh localhost (第一次需要输入密码,以后不用输入密码)
异机测试:将本机id_rsa.pub(公钥)放到异机的authorized_keys即可
四、hadoop软件安装和配置
1、官网下载hadoop版本软件 hadoop-2.7.7.tar.gz
2、解压+移动到指定目录
#解压
sudo tar -zxvf hadoop-2.7.7.tar.gz
#移动
sudo mv hadoop-2.7.7/ /home/hadoop/
#查看当前hadoop文件路径,路径对配置环境很关键
ls /home/hadoop/hadoop-2.7.7/
3、来波配置信息修改(关键且重要)
3.1 先找对目录,配置hadoop的文件都在这个etc/hadoop 下面
3.2 配置各个文件,备注:涉及路径的,需要check一下跟自己的符合
#配置core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop-2.7.7/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
#配置hadoop-env.sh
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
#配置yarn-env.sh #采用默认
#配置hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop-2.7.7/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop-2.7.7/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
#配置mapred-site.xml 如果没有这个文件模板,可以 sudo cp mapred-site.xml.template mapred-site.xml 生成一个
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
#配置yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>master:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
3.3 配置hadoop环境
#编辑linux系统环境
sudo vi /etc/profile
#内容如下
export HADOOP_HOME=/home/hadoop/hadoop-2.7.7
export PATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
3.4 启动&停止hadoop服务
#hadoop格式化
hdfs name -format
#启动hadoop
#方法1:/home/hadoop/hadoop-2.7.7/sbin/start-all.sh
#方法2:先启动 /home/hadoop/hadoop-2.7.7/sbin/start-dfs.sh
再启动 /home/hadoop/hadoop-2.7.7/sbin/start-yarn.sh
#启动成功查看命令
jps
#结果
5488 DataNode
5347 NameNode
6344 Jps
5866 ResourceManager
6012 NodeManager
5694 SecondaryNameNode
#关闭hadoop
/home/hadoop/hadoop-2.7.7/sbin/stop-all.sh
五、测试hadoop的示例
网上不错的测试圆周率的例子:
#hadoop版本查看
hadoop version
#网上不错的圆周率的测试例子
hadoop jar /home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar pi 10 10
六、常见问题解析
Q1:
Hadoop格式化HDFS报警告 java.net.UnknownHostException: bogon: bogon
解决方案:重启虚拟机,bogon是DNS逆解析失败导致的bogon,重启后执行 namenode 格式化命令,如果还不能解决;
则 vi /etc/hostname 改成slave02 ;vi /etc/hosts 添加127.0.0.1 slave02 重启虚拟机~(Slave02 是自己取得名字)
关键信息如下:
17/01/05 14:33:41 INFO namenode.FSImage: Allocated new BlockPoolId: BP-627974405-127.0.0.1-1483598021204
17/01/05 14:33:41 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
17/01/05 14:33:41 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
17/01/05 14:33:41 INFO util.ExitUtil: Exiting with status 0
17/01/05 14:33:41 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at slave02/127.0.0.1
************************************************************/
成功了吧~~
格式化完成,下一步启动Hadoop:
开启 NameNode 和 DataNode 守护进程(启动前要保证ssh服务是开着滴,命令service sshd start)
- ./sbin/start-dfs.sh
启动成功:
hadoop@slave02:/usr/local/hadoop> jps
2641 DataNode
2838 SecondaryNameNode
2536 NameNode
5853 Jps
hadoop@slave02:/usr/local/hadoop>
果然成功~
5.以下是Hadoop配置(分布式集群)
一主两从,即一个master两个slave
1) 配置文件:/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
2) 配置文件:/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>
3) 配置文件:/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
4) 配置masters和slaves主从结点
etc/hadoop/ 目录下没有masters文件夹,不需要配置,只配置Slaves节点
5) 配置/etc/hosts
需要root权限哟~
->#vi /etc/hosts
#127.0.0.1 localhost
192.168.126.137 master
192.168.126.138 slave01
192.168.126.136 slave02
实际IP自己更改~
6) 复制虚拟机,复制2份,一共三个虚拟机,master slave01 slave02
更改三台机器的主机名:vi /etc/hostname 分别为master slave01 slave02
重新设置ssh 免密码登录,使master 能免密码登录master slave01 slave02(如果能登录,就不用重配了~~)
验证完成master能登录master slave01 slave02
7)启动hadoop,
hadoop@master:/usr/local/hadoop> ./sbin/start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-master.out
slave01: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave01.out
slave02: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave02.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
jps一下看看:
hadoop@master:/usr/local/hadoop> jps
3419 Jps
3116 NameNode
3311 SecondaryNameNode
去slave01 jps 看看:
hadoop@slave01:~> jps
3278 Jps
3185 DataNode
去slave02 jps 看看:
hadoop@slave02:~> jps
2355 DataNode
2457 Jps
启动成功了~~好消息。
打开浏览器,去master:50070看看吧~
启动管理进程噻~
hadoop@master:/usr/local/hadoop> ./sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-master.out
slave01: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave01.out
slave02: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave02.out
hadoop@master:/usr/local/hadoop> jps
3116 NameNode
3785 ResourceManager
3311 SecondaryNameNode
3920 Jps