1 架构设计与环境准备
1.1架构设计
一般情况至少需要3台服务器,本次搭建准备4台:
服务器列表 | 角色 | hadoop进程 | hbase进程 | zookeeper进程 |
Server1 | master | Namenode,secondaryNameNode | HMaster |
|
Server2 | node1 | datanode1 | HRegionServer | QuorumPeerMain |
Server3 | node2 | datanode2 | HRegionServer | QuorumPeerMain |
Server4 | node3 | datanode3 | HRegionServer | QuorumPeerMain |
1.2.1 搭建4台系统一致的服务器(系统版本centos6.5或者ubuntu系统)
服务器列表 | 内网IP | 用户名 | 密码 |
Server1 | 192.168.115.133 | hadoop | 123456 |
Server2 | 192.168.115.134 | hadoop | 123456 |
Server3 | 192.168.115.135 | hadoop | 123456 |
Server4 | 192.168.115.136 | hadoop | 123456 |
1.2.2 所需软件
hadoop-2.7.3.tar
hbase-1.2.6-bin.tar
zookeeper-3.4.6.tar
jdk-8u121-linux-x64.tar
2 服务器基本设置
2.1 主机名修改
各个节点上分别执行以下,对应节点更改成对应主机名:
sudo hostname master
sudo vi /etc/sysconfig/network
HOSTNAME=master
重新登录
2.2 添加hosts映射表
所有节点执行添加以下,对应ip参考架构设计:
sudo vim /etc/hosts
192.168.115.133 master
192.168.115.134 node1
192.168.115.135 node2
192.168.115.136 node3
192.168.115.133 namenode
192.168.115.133 secondarynamenode
192.168.115.134 datanode1
192.168.115.135 datanode2
192.168.115.136 datanode3
192.168.115.134 zookeeper1
192.168.115.135 zookeeper2
192.168.115.136 zookeeper3
2.3 设置ssh无密码登录(至少保证主节点与子节点双向无密码登录,子节点与子节点之间不强制要求)
在所有节点执行:
1,sudo vim /etc/ssh/sshd_config
去掉以下三行注释
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
2,ssh-keygen -t rsa
生成公钥
master执行:
ssh-copy-id hadoop@node1
ssh-copy-id hadoop@node2
ssh-copy-id hadoop@node3
node1,node2,node3执行:
ssh-copy-id hadoop@master
双向验证:
ssh hadoop@node1
2.4 关闭防火墙
sudo service iptables stop
sudo chkconfig iptables off
2.5 修改打开文件数量限制 默认1024
查看
ulimit -n
修改
sudo vi /etc/security/limits.conf
hadoop - nofile 32768
hadoop - nproc 32000
sudo vi /etc/pam.d/login
session required pam_limits.so
reboot
验证:
ulimit -n
2.6 拷贝所需软件至master服务器
sudo mkdir soft
sudo chmod 777 -R soft/
将软件拷贝至此文件夹
cd soft
ls
hadoop-2.7.3.tar.gz jdk-8u121-linux-x64.tar.gz
hbase-1.2.6-bin.tar.gz zookeeper-3.4.6.tar.gz
3 软件安装
3.1安装java
master,node1,node2,node3:
sudo mkdir -p /var/lib/java
sudo chmod -R 777 /var/lib/java/
master:
cd soft
sudo tar -xvzf jdk-8u121-linux-x64.tar.gz -C /var/lib/java
scp -r /var/lib/java/jdk1.8.0_121/ hadoop@node1:/var/lib/java
scp -r /var/lib/java/jdk1.8.0_121/ hadoop@node2:/var/lib/java
scp -r /var/lib/java/jdk1.8.0_121/ hadoop@node3:/var/lib/java
master,node1,node2,node3:
修改环境变量
sudo vi /etc/profile
export JAVA_HOME=/var/lib/java/jdk1.8.0_121
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$PATH:${JAVA_HOME}/bin
让profile生效
source /etc/profile
查看
java -version
3.2安装hadoop
3.2.1 创建存放位置
master,node1,node2,node3:
sudo mkdir /var/lib/hadoop/
sudo chmod -R 777 /var/lib/hadoop/
sudo mkdir -p /var/hadoop/conf/
sudo chmod -R 777 /var/hadoop/conf/
sudo mkdir -p /var/hadoop/log/
sudo mkdir -p /var/hadoop/mapred/log/
sudo mkdir -p /var/hadoop/yarn/log/
sudo mkdir -p /var/hadoop/tmp/
sudo mkdir -p /var/hadoop/hdfs/name/coredata/
sudo mkdir -p /var/hadoop/hdfs/journal/
sudo mkdir -p /var/hadoop/hdfs/data-1/coredata/
sudo mkdir -p /var/hadoop/hdfs/data-2/coredata/
sudo mkdir -p /var/hadoop/hdfs/namesecondary/coredata/
sudo mkdir -p /var/hadoop/yarn/local-1/coredata/
sudo mkdir -p /var/hadoop/yarn/local-2/coredata/
sudo mkdir -p /var/hadoop/pids/
3.2.2 解压安装
master:
cd soft
sudo tar -xvzf hadoop-2.7.3.tar.gz -C /var/lib/hadoop/
sudo rm /var/lib/hadoop/hadoop-2.7.3/bin/*.cmd
sudo rm /var/lib/hadoop/hadoop-2.7.3/sbin/*.cmd
sudo rm /var/lib/hadoop/hadoop-2.7.3/etc/hadoop/*.cmd
sudo chmod -R 777 /var/lib/hadoop/
3.2.3 配置hadoop
3.2.3.1 hadoop系统环境变量
在所有节点上执行
sudo vi /etc/profile
添加
export HADOOP_HOME=/var/lib/hadoop/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=/var/hadoop/conf
环境变量生效
source /etc/profile
3.2.3.2 hadoop-env.sh
master:
sudo cp /var/lib/hadoop/hadoop-2.7.3/etc/hadoop/* /var/hadoop/conf/
sudo vi /var/hadoop/conf/hadoop-env.sh
添加
export JAVA_HOME=/var/lib/java/jdk1.8.0_121
export HADOOP_LOG_DIR=/var/hadoop/log
export HADOOP_MAPRED_LOG_DIR=/var/hadoop/mapred/log
export YARN_LOG_DIR=/var/hadoop/yarn/log
export HADOOP_PID_DIR=/var/hadoop/pids
export YARN_PID_DIR=/var/hadoop/pids
export HADOOP_HEAPSIZE=2048
编辑HADOOP_OPTS,在后面添加
-XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 -XX:SoftRefLRUPolicyMSPerMB=0
注意:不能在HADOOP_OPTS上设置堆内存,因为这样会导致hbase shell出错Could not find or load main class occurred
3.2.3.3 core-site.xml
sudo vi /var/hadoop/conf/core-site.xml
添加
#默认HDFS路径,写namenode,默认端口8020
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/var/hadoop/tmp</value>
</property>
<property>
<name>hadoop.logfile.size</name>
<value>10000000</value>
</property>
<property>
<name>hadoop.logfile.count</name>
<value>10</value>
</property>
3.2.3.4 hdfs-site.xml
sudo vi /var/hadoop/conf/hdfs-site.xml
添加
<!-- comma-separated directory for backup -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/var/hadoop/hdfs/name/coredata</value>
</property>
<!-- comma-separated directory for round-robin -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/var/hadoop/hdfs/data-1/coredata,file:/var/hadoop/hdfs/data-2/coredata</value>
</property>
<!-- comma-separated directory for backup -->
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:/var/hadoop/hdfs/namesecondary/coredata</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>namenode:50070</value>
</property>
#备份namenode
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>secondarynamenode:50090</value>
</property>
3.2.3.5 mapred-site.xml
sudo cp /var/hadoop/conf/mapred-site.xml.template /var/hadoop/conf/mapred-site.xml
sudo vi /var/hadoop/conf/mapred-site.xml
添加
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>namenode:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>namenode:19888</value>
</property>
<!-- Windows commit MapReduce job -->
<property>
<name>mapreduce.app-submission.cross-platform</name>
<value>true</value>
</property>
3.2.3.6 yarn-site.xml
sudo vi /var/hadoop/conf/yarn-site.xml
添加
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>namenode:8032</value>
</property>
<!-- comma-separated directory for round-robin -->
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:/var/hadoop/yarn/local-1/coredata,file:/var/hadoop/yarn/local-2/coredata</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>namenode:8088</value>
</property>
3.2.4 部署slaves
在将程序文件和配置文件复制从master到其他节点
scp -r /var/lib/hadoop/hadoop-2.7.3 hadoop@node1:/var/lib/hadoop/
scp -r /var/lib/hadoop/hadoop-2.7.3 hadoop@node2:/var/lib/hadoop/
scp -r /var/lib/hadoop/hadoop-2.7.3 hadoop@node3:/var/lib/hadoop/
scp /var/hadoop/conf/* hadoop@node1:/var/hadoop/conf/
scp /var/hadoop/conf/* hadoop@node2:/var/hadoop/conf/
scp /var/hadoop/conf/* hadoop@node3:/var/hadoop/conf/
3.2.5 在namenode中添加slaves
sudo vi /var/hadoop/conf/slaves
datanode1
datanode2
datanode3
3.2.6 设置目录权限
在所有节点上执行
sudo chmod -R 777 /var/lib/hadoop/
sudo chown -R hadoop:hadoop /var/lib/hadoop/
sudo chmod -R 777 /var/hadoop/conf/
sudo chown -R hadoop:hadoop /var/conf/
sudo chmod -R 777 /var/hadoop/log/
sudo chmod -R 777 /var/hadoop/mapred/log/
sudo chmod -R 777 /var/hadoop/yarn/log/
sudo chmod -R 777 /var/hadoop/tmp/
sudo chmod -R 777 /var/hadoop/hdfs/name/coredata/
sudo chmod -R 777 /var/hadoop/hdfs/journal/
sudo chmod -R 777 /var/hadoop/hdfs/data-1/coredata/
sudo chmod -R 777 /var/hadoop/hdfs/data-2/coredata/
sudo chmod -R 777 /var/hadoop/hdfs/namesecondary/coredata/
sudo chmod -R 777 /var/hadoop/yarn/local-1/coredata/
sudo chmod -R 777 /var/hadoop/yarn/local-2/coredata/
sudo chmod -R 777 /var/hadoop/pids/
sudo chown -R hadoop:hadoop /var/hadoop/log/
sudo chown -R hadoop:hadoop /var/hadoop/mapred/log/
sudo chown -R hadoop:hadoop /var/hadoop/yarn/log/
sudo chown -R hadoop:hadoop /var/hadoop/tmp/
sudo chown -R hadoop:hadoop /var/hadoop/hdfs/name/coredata/
sudo chown -R hadoop:hadoop /var/hadoop/hdfs/journal/
sudo chown -R hadoop:hadoop /var/hadoop/hdfs/data-1/coredata/
sudo chown -R hadoop:hadoop /var/hadoop/hdfs/data-2/coredata/
sudo chown -R hadoop:hadoop /var/hadoop/hdfs/namesecondary/coredata/
sudo chown -R hadoop:hadoop /var/hadoop/yarn/local-1/coredata/
sudo chown -R hadoop:hadoop /var/hadoop/yarn/local-2/coredata/
sudo chown -R hadoop:hadoop /var/hadoop/pids/
3.2.7 启动hadoop
HDFS初始化()
在namenode上格式化
hdfs namenode -format
1) 启动HDFS
a 在namenode上操作集群
start-dfs.sh
b 也可以单个操作
在namenode上操作NameNode
hadoop-daemon.sh --config /var/hadoop/conf --script hdfs start namenode
在secondarynamenode上操作SecondaryNameNode
hadoop-daemon.sh --config /var/hadoop/conf --script hdfs start secondarynamenode
在数据节点上操作DataNode
hadoop-daemons.sh --config /var/hadoop/conf --script hdfs start datanode
2) YARN
a 在namenode上操作集群
start-yarn.sh
b 也可以单个操作
在namenode上操作ResourceManager
yarn-daemon.sh --config /var/hadoop/conf start resourcemanager
在数据节点上操作NodeManager
yarn-daemons.sh --config /var/hadoop/conf start nodemanager
3) Job History
mr-jobhistory-daemon.sh --config /var/hadoop/conf start historyserver
查看namenode
hdfs getconf -namenodes
hdfs getconf -secondarynamenodes
验证启动
jps
在node1上有4个,slave上面有3个
3.2.8 使用hadoop
1) hdfs
创建用户目录
cd /var/lib/hadoop/hadoop-2.7.3/
hadoop fs -mkdir /user/saga
hadoop fs -chown saga:saga /user/saga
设定空间上限
hdfs dfsadmin -setSpaceQuota 1t /user/saga
2) MapReduce
示例input
sudo mkdir input
sudo vi input/in1.txt
hello world hello hadoop
sudo vi input/in2.txt
hello hadoop hello whatever
hadoop fs -rm -r /input
hadoop fs -rm -r /output
hadoop fs -mkdir /input
hadoop fs -put input/*.txt /input
hadoop fs -ls /input
运行
cd /var/lib/hadoop/hadoop-2.7.3/
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /input /output
查看结果
hadoop fs -ls /output
hadoop fs -cat /output/*
3.2.9 停止hadoop
1 Job History
mr-jobhistory-daemon.sh --config /var/hadoop/conf stop historyserver
2 YARN
a 在namenode上操作集群
stop-yarn.sh
b 也可以单个操作
在数据节点上操作NodeManager
yarn-daemons.sh --config /var/hadoop/conf stop nodemanager
在namenode上操作ResourceManager
yarn-daemon.sh --config /var/hadoop/conf stop resourcemanager
3 HDFS
a 在namenode上操作集群
stop-dfs.sh
b 也可以单个操作
在数据节点上操作DataNode
hadoop-daemons.sh --config /var/hadoop/conf --script hdfs stop datanode
在secondarynamenode上操作SecondaryNameNode
hadoop-daemon.sh --config /var/hadoop/conf --script hdfs stop secondarynamenode
在namenode上操作NameNode
hadoop-daemon.sh --config /var/hadoop/conf --script hdfs stop namenode
3.2.10 清理hadoop
在所有节点上执行
sudo rm -r /var/hadoop/log/*
sudo rm -r /var/hadoop/mapred/log/*
sudo rm -r /var/hadoop/yarn/log/*
sudo rm -r /var/hadoop/tmp/*
sudo rm -r /var/hadoop/hdfs/name/coredata/*
sudo rm -r /var/hadoop/hdfs/journal/*
sudo rm -r /var/hadoop/hdfs/data-1/coredata/*
sudo rm -r /var/hadoop/hdfs/data-2/coredata/*
sudo rm -r /var/hadoop/hdfs/namesecondary/coredata/*
sudo rm -r /var/hadoop/yarn/local-1/coredata/*
sudo rm -r /var/hadoop/yarn/local-2/coredata/*
清理后需要重新格式化hdfs
3.2.11 备注
注意:在单个操作时,slave的/var/hadoop/conf/slaves文件中只能有自己localhost。scp复制过来的包含所有slave。
注意:集群中每个机器dns解析都会使用自己的/etc/hosts文件,可能会将node?等名字绑定到127.0.1.1,导致集群无法正常使用。需要手动检查确保。
3.3安装zookeeper
3.3.1 创建存放位置
在所有zookeeper节点上执行
sudo mkdir -p /var/lib/zookeeper/
sudo chmod -R 777 /var/lib/zookeeper/
sudo mkdir -p /var/zookeeper/data/
sudo mkdir -p /var/zookeeper/log/
3.3.2 解压安装
在node1上执行
sudo mkdir soft
scp hadoop@master:/home/hadoop/soft/zookeeper-3.4.6.tar.gz .
cd soft
sudo tar -zxvf zookeeper-3.4.6.tar.gz -C /var/lib/zookeeper/
sudo rm /var/lib/zookeeper/zookeeper-3.4.6/bin/*.cmd
sudo chmod -R 777 /var/lib/zookeeper/
3.3.3 配置zookeeper
3.3.3.1 环境变量(zookeeper节点)
sudo vi /etc/profile
添加
export ZOOKEEPER_HOME=/var/lib/zookeeper/zookeeper-3.4.6
export PATH=$PATH:$ZOOKEEPER_HOME/bin
环境变量生效
source /etc/profile
3.3.3.2 zkEnv.sh
在node1上执行
sudo vi /var/lib/zookeeper/zookeeper-3.4.6/bin/zkEnv.sh
在前面添加
export JAVA_HOME=/var/lib/java/jdk1.8.0_121
export ZOO_LOG_DIR=/var/zookeeper/log
3.3.3.3 zoo.cfg
在node1上执行
cd /var/lib/zookeeper/zookeeper-3.4.6/
sudo cp conf/zoo_sample.cfg conf/zoo.cfg
sudo vi conf/zoo.cfg
编辑
dataDir=/var/zookeeper/data
clientPort=2181
maxSessionTimeout=300000
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
将以下两行前面的注释去掉
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
3.3.4 部署其他节点
在将程序文件和配置文件复制到zookeeper节点上
scp -r /var/lib/zookeeper/zookeeper-3.4.6 hadoop@node2:/var/lib/zookeeper/
scp -r /var/lib/zookeeper/zookeeper-3.4.6 hadoop@node3:/var/lib/zookeeper/
3.3.5 设置zookeeper节点ID
在zookeeper节点上执行
sudo vi /var/zookeeper/data/myid
编辑自己的编号1/2/3
3.3.6 目录权限
在zookeeper节点上执行
sudo chmod -R 777 /var/lib/zookeeper/
sudo chown -R root:root /var/lib/zookeeper/
sudo chmod -R 777 /var/zookeeper/
sudo chown -R root:root /var/zookeeper/
3.3.7 启动Zk
在zookeeper节点上执行
zkServer.sh start
3.3.8 查Zk状态
查看状态
zkServer.sh status
3.3.9 停止Zk
在zookeeper节点上执行
zkServer.sh stop
3.3.10 清理Zk
在zookeeper节点上执行
sudo rm -r /var/zookeeper/data/version-2
sudo rm -r /var/zookeeper/log/*
3.4安装hbase
3.4.1 创建存放位置
在所有节点上执行
sudo mkdir -p /var/lib/hbase/
sudo chmod -R 777 /var/lib/hbase/
sudo mkdir -p /var/hbase/conf/
sudo chmod -R 777 /var/hbase/conf/
sudo mkdir -p /var/hbase/log/
sudo mkdir -p /var/hbase/pids/
3.4.2 解压安装
在master上执行
cd soft
sudo tar -zxvf hbase-1.2.6-bin.tar.gz -C /var/lib/hbase/
sudo rm /var/lib/hbase/hbase-1.2.6/bin/*.cmd
sudo rm /var/lib/hbase/hbase-1.2.6/conf/*.cmd
sudo chmod -R 777 /var/lib/hbase/
3.4.3 配置hbase
3.4.3.1 hbase系统环境变量
在所有节点上执行
sudo vi /etc/profile
export HBASE_HOME=/var/lib/hbase/hbase-1.2.6
export PATH=$PATH:$HBASE_HOME/bin
export HBASE_CONF_DIR=/var/hbase/conf
环境变量生效
source /etc/profile
3.4.3.2 hadoop环境变量加入hbase
在所有节点上执行
sudo vi /var/hadoop/conf/hadoop-env.sh
添加
export HBASE_HOME=/var/lib/hbase/hbase-1.2.6
for f in $HBASE_HOME/lib/*.jar; do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
3.4.3.3 hbase-env.sh
在master上执行
sudo cp /var/lib/hbase/hbase-1.2.6/conf/* /var/hbase/conf/
sudo vi /var/hbase/conf/hbase-env.sh
添加
export JAVA_HOME=/var/lib/java/jdk1.8.0_121
export HBASE_LOG_DIR=/var/hbase/log
export HBASE_PID_DIR=/var/hbase/pids
添加
export HBASE_HEAPSIZE=2G
编辑HBASE_OPTS
-XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 -XX:SoftRefLRUPolicyMSPerMB=0
使用Java8+需要将以下两行注释掉:
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
如果使用已有ZooKeeper集群,将HBASE_MANAGES_ZK设为false(默认true)。
3.4.3.4 hbase-site.xml
sudo vi /var/hbase/conf/hbase-site.xml
添加
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://namenode:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>zookeeper1,zookeeper2,zookeeper3</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>300000</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/var/zookeeper/data</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>60030</value>
</property>
以下不配置:
注意:可以加入以下内容实现简单访问控制
<property>
<name>hbase.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hbase.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
<name>hbase.coprocessor.regionserver.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
注意:对应到客户端的hbase-site.xml,需要加入以下内容
<property>
<name>hbase.security.authentication</name>
<value>simple</value>
</property>
3.4.3.5 log4j.properties
sudo vi /var/hbase/conf/log4j.properties
将以下两处的256MB改为16MB
hbase.log.maxfilesize=16MB
hbase.log.maxbackupindex=20
hbase.security.log.maxfilesize=16MB
hbase.security.log.maxbackupindex=20
3.4.4 部署slaves
在将程序文件和配置文件复制到其他节点上
scp -r /var/lib/hbase/hbase-1.2.6 hadoop@node1:/var/lib/hbase/
scp -r /var/lib/hbase/hbase-1.2.6 hadoop@node2:/var/lib/hbase/
scp -r /var/lib/hbase/hbase-1.2.6 hadoop@node3:/var/lib/hbase/
scp /var/hbase/conf/* hadoop@node1:/var/hbase/conf/
scp /var/hbase/conf/* hadoop@node2:/var/hbase/conf/
scp /var/hbase/conf/* hadoop@node3:/var/hbase/conf/
3.4.5 添加regionservers
sudo vi /var/hbase/conf/regionservers
datanode1
datanode2
datanode3
以下不配置:
sudo vi /var/hbase/conf/backup-masters
hmasterbackup1
hmasterbackup2
3.4.6 设置目录权限
在所有节点上执行
sudo chmod -R 777 /var/lib/hbase/
sudo chown -R hadoop:hadoop /var/lib/hbase/
sudo chmod -R 777 /var/hbase/
sudo chmod -R 777 /var/zookeeper/
sudo chown -R hadoop:hadoop /var/hbase/
sudo chown -R hadoop:hadoop /var/zookeeper/ye
3.4.7 启动hbase
启动
start-hbase.sh
也可以单独操作
hbase-daemon.sh --config /var/hbase/conf start master
hbase-daemon.sh --config /var/hbase/conf start regionserver
3.4.8 使用hbase
hbase shell
create 'test', 'cf'
list 'test'
put 'test', 'row1', 'cf:a', 'value1'
put 'test', 'row2', 'cf:b', 'value2'
put 'test', 'row3', 'cf:c', 'value3'
scan 'test'
get 'test', 'row1'
disable 'test'
enable 'test'
disable 'test'
drop 'test'
exit
3.4.9 停止hbase
stop-hbase.sh
也可以单独操作
hbase-daemon.sh --config /var/hbase/conf stop regionserver
hbase-daemon.sh --config /var/hbase/conf stop master
3.4.10 清理hbase
sudo rm -r /var/hbase/log/*
注意:可以不依赖HBase管理ZooKeeper集群,将hbase-env.sh中HBASE_MANAGES_ZK设为false(默认true)。
在HBASE_MANAGES_ZK为true的情况下,使用以下命令可以单独启动/关闭ZooKeeper集群。
hbase-daemons.sh start zookeeper
hbase-daemons.sh stop zookeeper
3.4.11 备注
注意:在hbase-site.xml中,通过设置dfs.replication可以控制hbase的HDFS副本数。
注意:在hbase-site.xml中,在使用Phoenix的情况下,需要设置
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
<property>
<name>hbase.region.server.rpc.scheduler.factory.class</name>
<value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>
</property>
<property>
<name>hbase.rpc.controllerfactory.class</name>
<value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value>
</property>
4 集群管理:
4.1 jps
命令行输入jps查看当前机器运行的进程
master:
node:
4.2 停止集群
按照以下顺序停止各集群:
hbase---hadoop---zk:
master
regionserver
secondarynamenode
namenode
datanode
zookeeper
如安装了pssh命令可直接在master执行以下命令:
pssh -i -h ham.txt /var/lib/hbase/hbase-1.2.6/bin/hbase-daemon.sh --config /var/hbase/conf stop master
pssh -i -h had.txt /var/lib/hbase/hbase-1.2.6/bin/hbase-daemon.sh --config /var/hbase/conf stop regionserver
pssh -i -h hab.txt /var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs stop secondarynamenode
pssh -i -h ham.txt /var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs stop namenode
pssh -i -h had.txt /var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs stop datanode
pssh -i -h zk.txt /var/lib/zookeeper/zookeeper-3.4.6/bin/zkServer.sh stop
如未安装pssh,可分别在对应节点执行以下命令:
停止hbase:
master:/var/lib/hbase/hbase-1.2.6/bin/hbase-daemon.sh --config /var/hbase/conf stop master
各数据node:/var/lib/hbase/hbase-1.2.6/bin/hbase-daemon.sh --config /var/hbase/conf stop regionserver
停止hadoop:
master:/var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs stop secondarynamenode
master: /var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs stop namenode
各数据node:/var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs stop datanode
停止zk:
各zookeeper节点: /var/lib/zookeeper/zookeeper-3.4.6/bin/zkServer.sh stop
4.3 启动集群
按照以下顺序启动各集群:
zk---hadoop---hbase:
zookeeper
namenode
datanode
secondarynamenode
master
regionserver
如安装了pssh命令可直接在master执行以下命令:
pssh -i -h zk.txt /var/lib/zookeeper/zookeeper-3.4.6/bin/zkServer.sh start
pssh -i -h ham.txt /var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs start namenode
pssh -i -h had.txt /var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs start datanode
pssh -i -h hab.txt /var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs start secondarynamenode
pssh -i -h ham.txt /var/lib/hbase/hbase-1.2.6/bin/hbase-daemon.sh --config /var/hbase/conf start master
pssh -i -h had.txt /var/lib/hbase/hbase-1.2.6/bin/hbase-daemon.sh --config /var/hbase/conf start regionserver
如未安装pssh,可分别在对应节点执行以下命令:
启动zk:
各zookeeper节点:/var/lib/zookeeper/zookeeper-3.4.6/bin/zkServer.sh start
启动hadoop:
master:/var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs start namenode
各数据node:/var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs start datanode
master:/var/lib/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh --config /var/hadoop/conf --script hdfs start secondarynamenode
启动hbase:
master: /var/lib/hbase/hbase-1.2.6/bin/hbase-daemon.sh --config /var/hbase/conf start master
各数据node: /var/lib/hbase/hbase-1.2.6/bin/hbase-daemon.sh --config /var/hbase/conf start regionserver
4.4 格式化所有数据
4.4.1停止所有集群
按照1步骤
4.4.2 清除数据
所有节点执行:
sudo rm -r /var/hadoop/log/*
sudo rm -r /var/hadoop/mapred/log/*
sudo rm -r /var/hadoop/yarn/log/*
sudo rm -r /var/hadoop/tmp/*
sudo rm -r /var/hadoop/hdfs/name/coredata/*
sudo rm -r /var/nfs-hadoop/hdfs/name/*
sudo rm -r /var/hadoop/hdfs/journal/*
sudo rm -r /var/hadoop/hdfs/data-1/coredata/*
sudo rm -r /var/hadoop/hdfs/data-2/coredata/*
sudo rm -r /var/hadoop/hdfs/namesecondary/coredata/*
sudo rm -r /var/nfs-hadoop/hdfs/namesecondary/*
sudo rm -r /var/hadoop/yarn/local-1/coredata/*
sudo rm -r /var/hadoop/yarn/local-2/coredata/*
sudo rm -r /var/hbase/log/*
sudo rm -r /var/zookeeper/data/version-2
sudo rm -r /var/zookeeper/log/*
master执行:
/var/lib/hadoop/hadoop-2.7.3/bin/hadoop --config /var/hadoop/conf namenode -format
4.4.3启动所有集群
按照2步骤