Hadoop CDH4.5 HBase部署

本篇主要讲HBase的部署,Apache HBase provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS)

1    安装HBase

apt-get install hbase

2    HBase配置设置

        1    Using DNS with HBase

                HBase使用本机主机名来报告自己的IP地址。所以你的DNS必须得正常工作。

        2    Using the Network Time Protocol (NTP) with HBase

                时间也必须一致。

        3    为HBase设置用户限制

#vi  /etc/security/limits.conf
hdfs  -  nofile  32768
hbase -  nofile  32768

#vi  /etc/pam.d/common-session 
session required  pam_limits.so

        4    在HBase中设置dfs.datanode.max.xcievers

                在hadoop的HDFS DataNode节点中,有一个在同一时刻可以访问文件数目的最大值, 我们可以增大该值,以提高效率,至少配置4096,如下配置/etc/hadoop/conf/hdfs-site.xml:

<property>
 <name>dfs.datanode.max.xcievers</name>
 <value>4096</value>
</property>

3    HBase也有多种模式,单机模式

        默认请看下,HBase的配置文件就是单机模式的,在这种模式下,一个单独的JVM主机运行HBase Master,HBase Region Server,ZooKeeper等服务。

        1    安装HBase Master

apt-get install hbase-master

        2    启动HBase Master

service hbase-master start

        3    检查单机模式

http://localhost:60010

        4    安装配置REST 

apt-get install hbase-rest
service hbase-rest start
                修改配置文件hbase-site.xml

<property>
 <name>hbase.rest.port</name>
 <value>60050</value>
</property>

4    HBase伪分布式模式

       Pseudo-distributed mode differs from standalone mode in that each of the component processes run in a separate JVM 

        1    修改HBase配置文件/etc/hbase/conf/hbase-site.xml

<property>
 <name>hbase.cluster.distributed</name>
 <value>true</value>
</property>

<property>
 <name>hbase.rootdir</name>
 <value>hdfs://myhost:8020/hbase</value>
</property>

        2    在HDFS中创建/hbase目录

hadoop fs -mkdir /hbase
hadoop fs -chown hbase /hbase

        3    开启伪分布模式

                HBase要正常工作,还需要其他组件配合

                1    安装启用ZooKeeper Server

                        可以安装在同一台机器,启用不同的端口

                2    启动HBase Master

service hbase-master start

                3    安装启动HBase RegionServer

apt-get install hbase-regionserver
service hbase-regionserver start

                4    检查伪分布模式

jps

        4    Installing and Starting the HBase Thrift Server

               The HBase Thrift Server is an alternative gateway for accessing the HBase server. Thrift mirrors most of the HBase client APIs while enabling popular programming languages to interact with HBase. The Thrift Server is multiplatform and more performant than REST in many situations. Thrift can be run collocated along with the region servers, but should not be collocated with the NameNode or the JobTracker.

apt-get install hbase-thrift
service hbase-thrift start

5    部署分布式HBase

        1    选择部署地点

                master node:you will typically run the HBase Master and a ZooKeeper quorum peer(NameNode and JobTracker)

                slave nodes:On each node, Cloudera recommends running a Region Server(TaskTracker (MRv1) and a DataNode)

        2    部署配置文件

                在你决定部署在那台机器上之后,就可以修改配置文件了,然后把这些文件同步到其他机器,从伪分布式到分布式你只需要修改一个配置项,如下hbase-site.xml:

<property>
 <name>hbase.zookeeper.quorum</name>
 <value>mymasternode</value>
</property>
                HBase集群各个服务的启动顺序: 

                    1    The ZooKeeper Quorum Peer

                    2    The HBase Master

                    3    Each of the HBase RegionServers

                    这时候HBase的web接口可以通过60010来访问

        3    通过HBase Shell访问HBase

hbase shell

        4    Using MapReduce with HBase

                为了使用HBase运行mapreduce任务,你需要把HBase和zookeeper的jar包增加到hadoop java的classpath中。

        5    HBase复制

                HBase复制提供了从一个HBase集群向另一个HBase集群复制数据的功能。从用户应用收取数据的叫master集群,从master集群收取数据的叫slave集群。总共有三种模式:

                    1    Master-Slave Replication

                    2    Master-Master Replication

                    3    Cyclic Replication

                1    关于集群的小知识

                   *)  You make the configuration changes on the master cluster side

                   *)  In the case of master-master replication, you make the changes on both sides

                   *)  Replication works at the table-column-family level. The family should exist on all the slaves. (You can have additional, non replicating families on both sides).

                   *)  The timestamps of the replicated HLog entries are kept intact. In case of a collision (two entries identical as to row key, column family, column qualifier, and timestamp) only the entry arriving later write will be read.

                   *)  Increment Column Values (ICVs) are treated as simple puts when they are replicated. In the master-master case, this may be undesirable, creating identical counters that overwrite one another.

                   *)  Make sure the master and slave clusters are time-synchronized with each other.

                2    部署HBase复制

                        1    修改配置文件hbase-site.xml

<property>
 <name>hbase.replication</name>
 <value>true</value>
</property>

                        2    把hbase-site.xml发送到所有节点

                        3    重启HBase    

                        4    在HBase master中运行以下命令

add_peer
add_peer '<n>',"slave.zookeeper.quorum:zookeeper.clientport.:zookeeper.znode.parent"
example:hbase> add_peer '1', "zk.server.com:2181:/hbase"

                        5    一旦你有了peer,即可开启复制

disable 'your_table'
alter 'your_table', {NAME => 'family_name', REPLICATION_SCOPE => '1'}
enable 'your_table'

                        6    在HBase master上列出所有的peer配置

list_peers

                        7    在peer级别禁止复制

disable_peer ("<peerID>")
enable_peer(<"peerID">)

                        8    Stopping Replication in an Emergency

stop_replication

                        9    Initiating Replication of Pre-existing Data

                        10    Verifying Replicated Data

hadoop jar $HBASE_HOME/hbase-<version>.jar verifyrep [--starttime=timestamp1] [--stoptime=timestamp [--families=comma separated list of families] <peerId> <tablename>




转载于:https://my.oschina.net/guol/blog/265988

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值