Hadoop2.7.2+Hbase1.2.1分布式环境搭建整理
一.准备环境
1.1. 安装包
1)准备4台PC
2)安装配置Linux系统:CentOS-7.0-1406-x86_64-DVD.iso
3)安装配置Java环境:jdk-8u91-linux-x64.gz
4)安装配置Hadoop:hadoop-2.7.5-x64.tar.gz
5)安装配置Hbase:hbase-1.2.1-bin.tar.gz
1.2. 网络配置
主机名 | IP |
master | 202.196.37.40 |
slave1 | 202.196.37.41 |
slave2 | 202.196.37.42 |
slave3 | 202.196.37.43 |
1.3. 常用命令
# systemctl start foo.service #运行一个服务
# systemctl stop foo.service #停止一个服务
# systemctl restart foo.service #重启一个服务
# systemctl status foo.service #显示一个服务(无论运行与否)的状态
# systemctl enable foo.service #在开机时启用一个服务
# systemctl disable foo.service #在开机时禁用一个服务
# systemctl is-enablediptables.service #查看服务是否开机启动
# reboot #重启主机
# shutdown -h now #立即关机
# source /etc/profile #配置文件修改立即生效
# yum install net-tools
二.安装配置CentOS
2.1安装CentOS
1)选择启动盘CentOS-7.0-1406-x86_64-DVD.iso,启动安装
2)选择Install CentOS 7,回车,继续安装
3)选择语言,默认是English,学习可以选择中文,正时环境选择English
4)配置网络和主机名,主机名:master,网络选择开启,配置手动的IPV4
5)选择安装位置;在分区处选择手动配置;选择标准分区,点击这里自动创建他们,点击完成,收受更改
6)修改root密码,密码:a
7)重启,安装完毕。
2.2配置IP
2.2.1检查IP
# ip addr
或
# ip link
2.2.2配置IP和网管
#cd/etc/sysconfig/network-scripts #进入网络配置文件目录
# find ifcfg-em* #查到网卡配置文件,例如ifcfg-em1
# vi ifcfg-em1 #编辑网卡配置文件
或
# vi/etc/sysconfig/network-scripts/ifcfg-em1 #编辑网卡配置文件
配置内容:
BOOTPROTO=static #静态IP配置为static,动态配置为dhcp
ONBOOT=yes #开机启动
IPADDR=202.196.37.40 #IP地址
NETMASK=255.255.255.0 #子网掩码
GATEWAY=202.196.37.254
DNS1=202.196.35.67
# systemctl restart network.service #重启网络
2.2.3配置hosts
# vi /etc/hosts
编辑内容:
202.196.37.40 master
202.196.37.41 slave1
202.196.37.42 slave2
202.196.37.43lave3
2.3关闭防火墙
# systemctl status firewalld.service #检查防火墙状态
# systemctl stop firewalld.service #关闭防火墙
# systemctl disable firewalld.service #禁止开机启动防火墙
2.4时间同步
# yum install -y ntp #安装ntp服务
# ntpdate cn.pool.ntp.org #同步网络时间
2.5安装配置jdk
2.5.1卸载自带jdk
安装好的CentOS会自带OpenJdk,用命令java -version ,会有下面的信息:
java version"1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixedmode)
最好还是先卸载掉openjdk,在安装sun公司的jdk.
先查看rpm -qa | grep java
显示如下信息:
java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
卸载:
rpm -e --nodepsjava-1.4.2-gcj-compat-1.4.2.0-40jpp.115
rpm -e --nodepsjava-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
还有一些其他的命令
rpm -qa | grep gcj
rpm -qa | grep jdk
如果出现找不到openjdksource的话,那么还可以这样卸载
yum -y remove javajava-1.4.2-gcj-compat-1.4.2.0-40jpp.115
yum -y remove javajava-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
2.5.2安装jdk
上传jdk-8u91-linux-x64.gz 安装包到root根目录
# mkdir /usr/java
# tar -zxvf jdk-8u91-linux-x64.gz -C/usr/java/
# rm -rfjdk-8u91-linux-x64.gz
2.5.3各个主机之间复制jdk
# scp -r /usr/java slave1:/usr
# scp -r /usr/java slave2:/usr
# scp -r /usr/java slave3:/usr
2.5.4各个主机配置jdk环境变量
# vi /etc/profile
编辑内容
export JAVA_HOME= /usr/java/jdk1.8.0_131
export PATH=$JAVA_HOME/bin:$PATH
exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
# source/etc/profile #使配置文件生效
# java -version #查看java版本
2.6配置ssh无密钥访问
分别在各个主机上检查ssh服务状态:
# systemctl status sshd.service #检查ssh服务状态
# yum install openssh-server openssh-clients #安装ssh服务,如果已安装,则不用执行该步骤
# systemctl start sshd.service #启动ssh服务,如果已安装,则不用执行该步骤
分别在各个主机上生成密钥
# ssh-keygen -t rsa -N “” #生成密钥 空密码
在slave1上
# cp ~/.ssh/id_rsa.pub~/.ssh/slave1.id_rsa.pub
#scp ~/.ssh//slave.id_rsa.pub master:~/.ssh
在slave2上
# cp ~/.ssh/id_rsa.pub~/.ssh/slave2.id_rsa.pub
# scp ~/.ssh/slave2.id_rsa.pub master:~/.ssh
在slave3上
# cp ~/.ssh/id_rsa.pub~/.ssh/slave3.id_rsa.pub
# scp ~/.ssh/slave3.id_rsa.pubmaster:~/.ssh
在master上
# cd ~/.ssh
# cat id_rsa.pub >> authorized_keys
# cat slave1.id_rsa.pub >>authorized_keys
# cat slave2.id_rsa.pub >>authorized_keys
# cat slave3.id_rsa.pub >>authorized_keys
或者ssh-copy-id -i hhj152.id_rsa.pub root@hhj151
# scp authorized_keys slave1:~/.ssh
# scp authorized_keys slave2:~/.ssh
# scp authorized_keys slave3:~/.ssh
无密码访问
vi /etc/ssh/sshd_config
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
# vi/etc/ssh/ssh_config //注意是客户端设置文件
最后有两行是
StrictHostKeyChecking no #登录时是否询问
UserKnownHostsFile /dev/null #表示隐藏known_hosts文件
vi /etc/selinux/config
把SELINUX=enforcing修改为SELINUX=disabled
三.安装配置hadoop
3.1安装hadoop
上传hadoop-2.7.5.tar.gz安装包到root根目录
# tar -zxvf hadoop-2.7.5.tar.gz -C /usr
# rm -rf hadoop-2.7.5.tar.gz
# mkdir /usr/hadoop-2.7.5/tmp
# mkdir /usr/hadoop-2.7.5/logs
# mkdir /usr/hadoop-2.7.5/hdf
# mkdir /usr/hadoop-2.7.5/hdf/data
# mkdir /usr/hadoop-2.7.5/hdf/name
3.1.1在hadoop中配置hadoop-env.sh文件
edit the file etc/hadoop/hadoop-env.sh todefine some parameters as follows:
# set to the root of your Java installation
vi /usr/hadoop-2.7.5/etc/Hadoop/ hadoop-env.sh
exportJAVA_HOME=/usr/java/jdk1.8.0_91
3.1.2修改yarn-env.sh
vi /usr/hadoop-2.7.5/etc/Hadoop/yarn-env.sh
#export JAVA_HOME=/home/y/libexec/jdk1.7.0/
export JAVA_HOME=/usr/java/jdk1.8.0_91
3.1.3修改slaves
# vi /usr/hadoop-2.7.5/etc/hadoop/slaves
配置内容:
删除:localhost
添加:
slave1
slave2
slave3
3.1.4修改core-site.xml
# vi /usr/hadoop-2.7.5/etc/hadoop/core-site.xml
配置内容:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/hadoop-2.7.5/tmp</value>
</property>
</configuration>
3.1.5修改hdfs-site.xml
# vi /usr/hadoop-2.7.5/etc/hadoop/hdfs-site.xml
配置内容:
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/hadoop-2.7.5/hdf/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/hadoop-2.7.5/hdf/name</value>
<final>true</final>
</property>
</configuration>
3.1.6修改mapred-site.xml
cp /usr/hadoop-2.7.5/etc/hadoop/mapred-site.xml.template /usr/hadoop-2.7.5/etc/hadoop/mapred-site.xml
# vi /usr/hadoop-2.7.5/etc/hadoop/mapred-site.xml
配置内容:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
3.1.7修改yarn-site.xml
# vi /usr/hadoop-2.7.5/etc/hadoop/yarn-site.xml
配置内容:
<configuration>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
3.2各个主机之间复制hadoop
# scp -r /usr/ hadoop-2.7.5 slave1:/usr
# scp -r /usr/ hadoop-2.7.5 slave2:/usr
# scp -r /usr/ hadoop-2.7.5 slave3:/usr
3.3各个主机配置hadoop环境变量
# vi /etc/profile
编辑内容:
export HADOOP_HOME=/usr/ hadoop-2.7.5
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
exportHADOOP_LOG_DIR=/usr/hadoop-2.7.5/logs
export YARN_LOG_DIR=$HADOOP_LOG_DIR
# source /etc/profile #使配置文件生效
3.4格式化namenode
# cd /usr/hadoop-2.7.5/sbin
# hdfs namenode -format
3.5启动hadoop
启动hdfs:
# cd /usr/hadoop-2.7.5/sbin
# start-all.sh
检查hadoop启动情况:
http://202.196.37.40:50070
http://202.196.37.40:8088/cluster
检查进程:
# jps
master主机包含ResourceManager、SecondaryNameNode、NameNode等,则表示启动成功,例如
2212 ResourceManager
2484 Jps
1917 NameNode
2078 SecondaryNameNode
//若DataNode无法启动,清空/usr/hadoop-2.7.5/hdf/data/*与/usr/hadoop-2.7.5/hdf/name/*
17334 Jps
各个slave主机包含DataNode、NodeManager等,则表示启用成功,例如
17153 DataNode
17241 NodeManager
四.安装配置zookeeper
4.1配置zookeeper环境变量
vi /etc/profile
export ZOOKEEPER_HOME=/usr/zookeeper-3.4.6
export PATH=$ZOOKEEPER_HOME/bin:$PATH
source /etc/profile
4.2配置zookeeper
1、到zookeeper官网下载zookeeper
http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.4.6/
2、在slave1,slave2,slave3上面搭建zookeeper
例如:
slave1202.196.37.41
slave2 202.196.37.42
slave3 202.196.37.43
3、上传zookeeper-3.4.6.tar.gz到任意一台服务器的根目录,并解压:zookeeper:
tar–zxvf zookeeper-3.4.6.tar.gz -C /usr
4、在zookeeper目录下建立zookeeper-data目录,同时将zookeeper目录下conf/zoo_simple.cfg文件复制一份成zoo.cfg
cp /usr/zookeeper-3.4.6/conf/zoo_sample.cfg /usr/zookeeper-3.4.6/conf/zoo.cfg
5、修改zoo.cfg
# Thenumber of milliseconds of each tick
tickTime=2000
# Thenumber of ticks that the initial
#synchronization phase can take
initLimit=10
# Thenumber of ticks that can passbetween
#sending a request and getting anacknowledgement
syncLimit=5
# thedirectory where the snapshot isstored.
# do notuse /tmp for storage, /tmp hereis just
#example sakes.
dataDir=/usr/zookeeper-3.4.6/zookeeper-data
dataLogDir=/usr/zookeeper-3.4.11/datalog
# theport at which the clients willconnect
clientPort=2181
# themaximum number of clientconnections.
#increase this if you need to handle moreclients
#maxClientCnxns=60
#
# Besure to read the maintenance sectionof the
# administratorguide before turning onautopurge.
#
#http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# Thenumber of snapshots to retain indataDir
#autopurge.snapRetainCount=3
# Purgetask interval in hours
# Set to"0" to disable autopurge feature
#autopurge.purgeInterval=1
server.1=slave1:2888:3888
server.2=slave2:2888:3888
server.3=slave3:2888:3888
6、拷贝zookeeper目录到另外两台服务器:
scp -r /usr/zookeeper-3.4.11 slave2:/usr
scp-r/usr/zookeeper-3.4.6 slave3:/usr
分别在几台服务器的zookeeper-data目录下建立myid其ip对应相应的server.*server.1的myid内容为1server.2的myid内容为2server.3的myid为3
[root@node01 conf]# mkdir -p /usr/zookeeper-3.4.11/zookeeper-data
[root@node01 conf]# mkdir -p /usr/zookeeper-3.4.11/datalog
[root@node01 data]# echo 1 > myid
7、启动ZooKeeper集群,在每个节点上分别启动ZooKeeper服务:
Cd /usr/zookeeper-3.4.6/ | |
bin/zkServer.sh start |
|
8、可以查看ZooKeeper集群的状态,保证集群启动没有问题:分别查看每台服务器的zookeeper状态zookeeper#bin/zkServer.shstatus查看那些是following那个是leader
Eg:
zkServer.sh status
五.安装配置hbase
5.1安装hbase
上传hbase-1.2.1-bin.tar.gz安装包到root根目录
# tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr
# mkdir /usr/hbase-1.2.1/logs
5.2配置hbase环境变量
vi /etc/profile
ehco “export HBASE_HOME= /usr/hbase-1.4.2”>>/etc/profile
export PATH=$PATH:$HBASE_HOME/bin
source /etc/profile
5.3修改hbase-env.sh
# vi/usr/hbase-1.2.1/conf/hbase-env.sh
配置内容:
export JAVA_HOME=/usr/java/jdk1.8.0_91
export HBASE_LOG_DIR=${HBASE_HOME}/logs
export HBASE_MANAGES_ZK=false
5.4修改regionservers
# vi /usr/hbase-1.2.1/conf/regionservers
配置内容:
删除:localhost
添加:
slave1
slave2
slave3
5.5修改hbase-site.xml
# vi /usr/hbase-1.4.2/conf/hbase-site.xml
配置内容:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>slave1,slave2,slave3</value>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://master:60000</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/zookeeper-3.4.6/zookeeper-data</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value> //要求zookeeper中zoo.cfg 中配置的端口号,与hbase中hbase-site.xml客户端配置一致。都为2181端口
</property>
</configuration>
5.6各个主机之间复制hbase
# scp -r /usr/hbase-1.2.1 slave1:/usr
# scp -r /usr/hbase-1.2.1 slave2:/usr
# scp -r /usr/hbase-1.2.1 slave3:/usr
5.7启动hbase
启动之前先启动hadoop和zookeeper集群
启动hbase:
# cd /usr/hbase-1.2.1/bin
#./start-hbase.sh
检查hbase启动情况
http://192.168.0.151:60010/master-status
http://202.196.**.**:60030/rs-status (自己的ip)
检查进程:
# jps
master主机包含ResourceManager、SecondaryNameNode、NameNode、HQuorumPeer、HMaster等,则表示启动成功,例如
2212 ResourceManager
2999 Jps
2697 HQuorumPeer
1917 NameNode
2078 SecondaryNameNode
2751 HMaster
各个slave主机包含DataNode、NodeManager、HRegionServer、HQuorumPeer等,则表示启用成功,例如
17540 Jps
17142 NodeManager
17338 HRegionServer
17278 HQuorumPeer
17055 DataNode