大数据之HBase

Author: Lijb

HBase是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System)所提供的分布式数据存储一样,HBase在Hadoop之上提供了类似于Bigtable的能力。HBase是Apache的Hadoop项目的子项目。HBase不同于一般的关系数据库,它是一个适合于非结构化数据存储的数据库。另一个不同的是HBase基于列的而不是基于行的模式。

Hbase和HDFS之间关系?

因为HDFS文件系统虽然支持海量数据存储,但是不擅长对单条记录做高效的管理(查询、修改、删除,增加)不支持对海量数据做随机读写。HBase是构建在HDFS上的一款NoSQL数据库,实现了对HDFS上的数据的高效管理,能够实现对海量数据的随机读写,实现行级别数据管理。

HBase数据库特点

  • 大 hbase一张表规模一般是在数亿行*数百万列且每一列具备上千个版本
  • 稀疏 ,HBase没有固定的表结构,在一行记录中,可以有任意多个列存在(提升磁盘利用率)。
  • HBase没有数据类型,所有的类型都是以字节数组形式存在。
  • 该数据和常规的数据库最大的区别是在底层对表中记录管理形式上有很大的区别,因为绝大多数数据库都是面向行存储的模式,导致了系统的IO利用率低。在HBase中采用面向列存储的形式,极大的提升系统的IO利用率。

行存储和列存储

行存储

列存储

HBase环境搭建

  • 确保hadoop能正常运行(HDFS),必须配置 HADOOP_HOME

  • 安装zookeeper(管理hbase服务)

    [root@CentOS ~]# tar -zxf zookeeper-3.4.6.tar.gz -C /usr/ [root@CentOS ~]# vi /usr/zookeeper-3.4.6/conf/zoo.cfg tickTime=2000 dataDir=/root/zkdata clientPort=2181 [root@CentOS ~]# mkdir /root/zkdata [root@CentOS zookeeper-3.4.6]# ./bin/zkServer.sh start zoo.cfg JMX enabled by default Using config: /usr/zookeeper-3.4.6/bin/../conf/zoo.cfg Starting zookeeper ... STARTED [root@CentOS zookeeper-3.4.6]# ./bin/zkServer.sh status zoo.cfg JMX enabled by default Using config: /usr/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: standalone [root@centos ~]# jps 1612 SecondaryNameNode 1348 NameNode 1742 QuorumPeerMain //zookeeper 1437 DataNode

  • 安装HBase

    [root@centos ~]# tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/ [root@centos ~]# vi /usr/hbase-1.2.4/conf/hbase-site.xml

    <property> <name>hbase.rootdir</name> <value>hdfs://CentOS:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>CentOS</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> [root@centos ~]# vi /usr/hbase-1.2.4/conf/regionservers

    centos

    [root@centos ~]# vi .bashrc

    HBASE_MANAGES_ZK=false HBASE_HOME=/usr/hbase-1.2.4 HADOOP_HOME=/usr/hadoop-2.6.0 JAVA_HOME=/usr/java/latest CLASSPATH=. PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin export JAVA_HOME export CLASSPATH export PATH export HADOOP_HOME export HBASE_HOME export HBASE_MANAGES_ZK

  • 启动HBase

    [root@centos ~]# start-hbase.sh [root@centos ~]# jps 1612 SecondaryNameNode 2102 HRegionServer //负责实际表数据的读写操作 1348 NameNode 2365 Jps 1978 HMaster //类似namenode管理表相关元数据、管理ResgionServer 1742 QuorumPeerMain 1437 DataNode

可以访问:http://centos:16010

HBase Shell命令

  • 连接Hbase

    [root@centos ~]# hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017

    hbase(main):001:0>

  • 查看系统状态

    hbase(main):001:0> status 1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load

  • 查看当前系统版本

    hbase(main):006:0> version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017

namespace操作(数据库)

  • 查看系统数据库

    hbase(main):003:0> list_namespace NAMESPACE default
    hbase

  • 创建namespace

    hbase(main):006:0> create_namespace 'baizhi',{'author'=>'zs'} 0 row(s) in 0.3260 seconds

  • 查看namespace的表

    hbase(main):004:0> list_namespace_tables 'hbase' TABLE
    meta
    namespace

  • 查看建库详情

    hbase(main):008:0> describe_namespace 'baizhi' DESCRIPTION
    {NAME => 'baizhi', author => 'zs'} 1 row(s) in 0.0550 seconds

  • 修改namespace

    hbase(main):010:0> alter_namespace 'baizhi',{METHOD => 'set','author'=> 'wangwu'} 0 row(s) in 0.2520 seconds hbase(main):011:0> describe_namespace 'baizhi' DESCRIPTION
    {NAME => 'baizhi', author => 'wangwu'} 1 row(s) in 0.0030 seconds hbase(main):012:0> alter_namespace 'baizhi',{METHOD => 'unset',NAME => 'author'} 0 row(s) in 0.0550 seconds hbase(main):013:0> describe_namespace 'baizhi' DESCRIPTION
    {NAME => 'baizhi'}
    1 row(s) in 0.0080 seconds

  • 删除namespace

    hbase(main):020:0> drop_namespace 'baizhi' 0 row(s) in 0.0730 seconds

HBase不允许删除有表的数据库

table相关操作(DDL操作)

  • 创建表

    hbase(main):023:0> create 't_user','cf1','cf2' 0 row(s) in 1.2880 seconds

    => Hbase::Table - t_user hbase(main):024:0> create 'baizhi:t_user',{NAME=>'cf1',VERSIONS=>3},{NAME=>'cf2',TTL=>3600} 0 row(s) in 1.2610 seconds

    => Hbase::Table - baizhi:t_user

  • 查看建表详情

    hbase(main):026:0> describe 'baizhi:t_user' Table baizhi:t_user is ENABLED
    baizhi:t_user
    COLUMN FAMILIES DESCRIPTION
    {NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =

    'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', B LOCKCACHE => 'true'}
    {NAME => 'cf2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION = 'NONE', MIN_VERSIONS => '0', TTL => '3600 SECONDS (1 HOUR)', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
    2 row(s) in 0.0240 seconds

  • 判断表是否存在

    hbase(main):030:0> exists 't_user' Table t_user does exist
    0 row(s) in 0.0250 seconds

  • enable/is_enabled/enable_all (类似disable、disable_all、is_disabled)

    hbase(main):036:0> enable 't_user' 0 row(s) in 0.0220 seconds

    hbase(main):037:0> is_enabled 't_user' true
    0 row(s) in 0.0090 seconds hbase(main):035:0> enable_all 't_.*' t_user
    Enable the above 1 tables (y/n)? y 1 tables successfully enabled

  • drop表

    hbase(main):038:0> disable 't_user' 0 row(s) in 2.2930 seconds

    hbase(main):039:0> drop 't_user' 0 row(s) in 1.2670 seconds

  • 展示所有用户表(无法查看系统表hbase下的表)

    hbase(main):042:0> list 'baizhi:.*' TABLE
    baizhi:t_user
    1 row(s) in 0.0050 seconds

    => ["baizhi:t_user"] hbase(main):043:0> list TABLE
    baizhi:t_user
    1 row(s) in 0.0050 seconds

  • 获取一个表的引用

    hbase(main):002:0> t=get_table 'baizhi:t_user' 0 row(s) in 0.0440 seconds

  • 修改表参数

    hbase(main):008:0> alter 'baizhi:t_user',{ NAME => 'cf2', TTL => 60 } Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 2.7000 seconds


表的DML操作

  • Put指令

    hbase(main):010:0> put 'baizhi:t_user',1,'cf1:name','zhangsan' 0 row(s) in 0.2060 seconds

    hbase(main):011:0> t = get_table 'baizhi:t_user' 0 row(s) in 0.0010 seconds

    hbase(main):012:0> t.put 1,'cf1:age','18' 0 row(s) in 0.0500 seconds

  • Get指令

    hbase(main):017:0> get 'baizhi:t_user',1 COLUMN CELL
    cf1:age timestamp=1536996547967, value=21
    cf1:name timestamp=1536996337398, value=zhangsan
    2 row(s) in 0.0680 seconds hbase(main):019:0> get 'baizhi:t_user',1,{COLUMN =>'cf1', VERSIONS=>10} COLUMN CELL
    cf1:age timestamp=1536996547967, value=21 cf1:age timestamp=1536996542980, value=20 cf1:age timestamp=1536996375890, value=18 cf1:name timestamp=1536996337398, value=zhangsan 4 row(s) in 0.0440 seconds

    hbase(main):020:0> get 'baizhi:t_user',1,{COLUMN =>'cf1:age', VERSIONS=>10} COLUMN CELL
    cf1:age timestamp=1536996547967, value=21 cf1:age timestamp=1536996542980, value=20 cf1:age timestamp=1536996375890, value=18
    3 row(s) in 0.0760 seconds

    hbase(main):021:0> get 'baizhi:t_user',1,{COLUMN =>'cf1:age', TIMESTAMP => 1536996542980 } COLUMN CELL cf1:age timestamp=1536996542980, value=20
    1 row(s) in 0.0260 seconds

    hbase(main):025:0> get 'baizhi:t_user',1,{TIMERANGE => [1536996375890,1536996547967]} COLUMN CELL
    cf1:age timestamp=1536996542980, value=20 1 row(s) in 0.0480 seconds

    hbase(main):026:0> get 'baizhi:t_user',1,{TIMERANGE => [1536996375890,1536996547967],VERSIONS=>10} COLUMN CELL cf1:age timestamp=1536996542980, value=20 cf1:age timestamp=1536996375890, value=18 2 row(s) in 0.0160 seconds

  • scan

    hbase(main):004:0> scan 'baizhi:t_user' ROW COLUMN+CELL
    1 column=cf1:age, timestamp=1536996547967, value=21 1 column=cf1:height, timestamp=1536997284682, value=170 1 column=cf1:name, timestamp=1536996337398, value=zhangsan 1 column=cf1:salary, timestamp=1536997158586, value=15000
    1 column=cf1:weight, timestamp=1536997311001, value=\x00\x00\x00\x00\x00\x00\x00\x05
    2 column=cf1:age, timestamp=1536997566506, value=18 2 column=cf1:name, timestamp=1536997556491, value=lisi
    2 row(s) in 0.0470 seconds hbase(main):009:0> scan 'baizhi:t_user', {STARTROW => '1',LIMIT=>1} ROW COLUMN+CELL 1 column=cf1:age, timestamp=1536996547967, value=21 1 column=cf1:height, timestamp=1536997284682, value=170 1 column=cf1:name, timestamp=1536996337398, value=zhangsan 1 column=cf1:salary, timestamp=1536997158586, value=15000 1 column=cf1:weight, timestamp=1536997311001, value=\x00\x00\x00\x00\x00\x00\x00\x05
    1 row(s) in 0.0280 seconds

    hbase(main):011:0> scan 'baizhi:t_user', {COLUMNS=>'cf1:age',TIMESTAMP=>1536996542980} ROW COLUMN+CELL 1 column=cf1:age, timestamp=1536996542980, value=20 1 row(s) in 0.0330 seconds

  • delete/deleteall

    hbase(main):013:0> scan 'baizhi:t_user', {COLUMNS=>'cf1:age',VERSIONS=>3} ROW COLUMN+CELL
    1 column=cf1:age, timestamp=1536996547967, value=21 1 column=cf1:age, timestamp=1536996542980, value=20 1 column=cf1:age, timestamp=1536996375890, value=18 2 column=cf1:age, timestamp=1536997566506, value=18
    2 row(s) in 0.0150 seconds

    hbase(main):014:0> delete 'baizhi:t_user',1,'cf1:age',1536996542980 0 row(s) in 0.0920 seconds

    hbase(main):015:0> scan 'baizhi:t_user', {COLUMNS=>'cf1:age',VERSIONS=>3} ROW COLUMN+CELL 1 column=cf1:age, timestamp=1536996547967, value=21 2 column=cf1:age, timestamp=1536997566506, value=18
    2 row(s) in 0.0140 seconds

    hbase(main):016:0> delete 'baizhi:t_user',1,'cf1:age' 0 row(s) in 0.0170 seconds

    hbase(main):017:0> scan 'baizhi:t_user', {COLUMNS=>'cf1:age',VERSIONS=>3} ROW COLUMN+CELL
    2 column=cf1:age, timestamp=1536997566506, value=18
    1 row(s) in 0.0170 seconds

    hbase(main):019:0> deleteall 'baizhi:t_user',1 0 row(s) in 0.0200 seconds

    hbase(main):020:0> get 'baizhi:t_user',1 COLUMN CELL 0 row(s) in 0.0200 seconds

  • truncate

    hbase(main):022:0> truncate 'baizhi:t_user' Truncating 'baizhi:t_user' table (it may take a while):

    • Disabling table...
    • Truncating table... 0 row(s) in 4.0040 seconds

HBase java API

  • maven依赖

    <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>1.2.4</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-common</artifactId> <version>1.2.4</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-protocol</artifactId> <version>1.2.4</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-server</artifactId> <version>1.2.4</version> </dependency>

  • 创建Connection和Admin对象

    private Connection conn; private Admin admin; @Before public void before() throws IOException { Configuration config= HBaseConfiguration.create(); //因为HMaster和HRegionServer都将信息注册在zookeeper中 config.set("hbase.zookeeper.quorum","centos"); conn= ConnectionFactory.createConnection(config);

      admin=conn.getAdmin();
    

    } @After public void after() throws IOException { admin.close(); conn.close(); }

  • 创建Namespace

    NamespaceDescriptor nd=NamespaceDescriptor.create("zpark") .addConfiguration("author","zhangsan") .build(); admin.createNamespace(nd);

  • 创建表

    //create 'zpark:t_user',{NAME=>'cf1',VERIONS=>3},,{NAME=>'cf2',TTL=>10} TableName tname=TableName.valueOf("zpark:t_user"); //创建表的描述 HTableDescriptor t_user=new HTableDescriptor(tname);

    //构建列簇 HColumnDescriptor cf1=new HColumnDescriptor("cf1"); cf1.setMaxVersions(3);

    HColumnDescriptor cf2=new HColumnDescriptor("cf2"); cf2.setTimeToLive(10);

    //添加列簇 t_user.addFamily(cf1); t_user.addFamily(cf2);

    admin.createTable(t_user);

  • 插入数据

    TableName tname=TableName.valueOf("zpark:t_user"); Table t_user = conn.getTable(tname);

    String[] company={"www.baizhi.com","www.sina.com"}; for(int i=0;i<1000;i++){ String com=company[new Random().nextInt(2)]; String rowKey=com; if(i<10){ rowKey+=":00"+i; }else if(i<100){ rowKey+=":0"+i; }else if(i<1000){ rowKey+=":"+i; } Put put=new Put(rowKey.getBytes()); put.addColumn("cf1".getBytes(),"name".getBytes(),("user"+i).getBytes()); put.addColumn("cf1".getBytes(),"age".getBytes(), Bytes.toBytes(i)); put.addColumn("cf1".getBytes(),"salary".getBytes(),Bytes.toBytes(5000+1000*i)); put.addColumn("cf1".getBytes(),"company".getBytes(),com.getBytes());

      t_user.put(put);
    

    } t_user.close();

  • 批量插入

    TableName tname=TableName.valueOf("zpark:t_user"); String[] company={"www.baizhi.com","www.sina.com"}; BufferedMutator mutator=conn.getBufferedMutator(tname); for(int i=0;i<1000;i++){ String com=company[new Random().nextInt(2)]; String rowKey=com; if(i<10){ rowKey+=":00"+i; }else if(i<100){ rowKey+=":0"+i; }else if(i<1000){ rowKey+=":"+i; } Put put=new Put(rowKey.getBytes()); put.addColumn("cf1".getBytes(),"name".getBytes(),("user"+i).getBytes()); put.addColumn("cf1".getBytes(),"age".getBytes(), Bytes.toBytes(i)); put.addColumn("cf1".getBytes(),"salary".getBytes(),Bytes.toBytes(5000+1000*i)); put.addColumn("cf1".getBytes(),"company".getBytes(),com.getBytes()); mutator.mutate(put); } mutator.close(); mutator.close();

  • 修改数据

    TableName tname=TableName.valueOf("zpark:t_user"); Table t_user = conn.getTable(tname);

    Put put=new Put("www.baizhi.com:000".getBytes()); put.addColumn("cf1".getBytes(),"name".getBytes(),("zhangsan").getBytes());

    t_user.put(put); t_user.close();

  • 查询一条记录

    TableName tname=TableName.valueOf("zpark:t_user"); Table t_user = conn.getTable(tname);

    Get get=new Get("www.sina.com:002".getBytes()); //表示一行数据,涵盖n个cell Result result = t_user.get(get);

    String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes())); Integer age = Bytes.toInt(result.getValue("cf1".getBytes(), "age".getBytes())); Integer salary = Bytes.toInt(result.getValue("cf1".getBytes(), "salary".getBytes())); System.out.println(name+","+age+","+salary);

    t_user.close();

  • 查询多条

    TableName tname=TableName.valueOf("zpark:t_user"); Table t_user = conn.getTable(tname);

    Scan scan=new Scan(); // scan.setStartRow("www.baizhi.com:000".getBytes()); // scan.setStopRow("www.taizhi.com:020".getBytes()); Filter filter1=new PrefixFilter("www.baizhi.com:00".getBytes()); Filter filter2=new PrefixFilter("www.sina.com:00".getBytes()); FilterList filter=new FilterList(FilterList.Operator.MUST_PASS_ONE,filter1,filter2); scan.setFilter(filter);

    ResultScanner resultScanner = t_user.getScanner(scan); for (Result result : resultScanner) { String rowKey=Bytes.toString(result.getRow()); String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes())); Integer age = Bytes.toInt(result.getValue("cf1".getBytes(), "age".getBytes())); Integer salary = Bytes.toInt(result.getValue("cf1".getBytes(), "salary".getBytes())); System.out.println(rowKey+" => "+ name+","+age+","+salary); } t_user.close();

  • 查询多条+Filter

    TableName tname=TableName.valueOf("zpark:t_user"); Table t_user = conn.getTable(tname);

    Scan scan=new Scan(); // scan.setStartRow("www.baizhi.com:000".getBytes()); // scan.setStopRow("www.taizhi.com:020".getBytes()); Filter filter1=new PrefixFilter("www.baizhi.com:00".getBytes()); Filter filter2=new PrefixFilter("www.sina.com:00".getBytes()); FilterList filter=new FilterList(FilterList.Operator.MUST_PASS_ALL,filter1,filter2); scan.setFilter(filter);

    ResultScanner resultScanner = t_user.getScanner(scan); for (Result result : resultScanner) { String rowKey=Bytes.toString(result.getRow()); String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes())); Integer age = Bytes.toInt(result.getValue("cf1".getBytes(), "age".getBytes())); Integer salary = Bytes.toInt(result.getValue("cf1".getBytes(), "salary".getBytes())); System.out.println(rowKey+" => "+ name+","+age+","+salary); } t_user.close();

HBase和MapReduce集成

public class CustomJobSubmiter extends Configured implements Tool {
    public int run(String[] args) throws Exception {

        //创建Job
        Configuration config = HBaseConfiguration.create(getConf());
        Job job = Job.getInstance(config);
        job.setJarByClass(CustomJobSubmiter.class);     // class that contains mapper

        //设置输入、输出格式
        job.setInputFormatClass(TableInputFormat.class);
        job.setOutputFormatClass(TableOutputFormat.class);

        TableMapReduceUtil.initTableMapperJob(
                "zpark:t_user",
                new Scan(),
                UserMapper.class,
                Text.class,
                DoubleWritable.class,
                job
        );
        TableMapReduceUtil.initTableReducerJob(
                "zpark:t_user_count",
                UserReducer.class,
                job
        );
        job.setCombinerClass(UserCombiner.class);
        job.waitForCompletion(true);
        
        
        return 0;
    }

    public static void main(String[] args) throws Exception {
        ToolRunner.run(new CustomJobSubmiter(),args);
    }
    public static class UserMapper extends TableMapper<Text, CountWritable>{
        private Text k=new Text();
        private DoubleWritable v=new DoubleWritable();
        @Override
        protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
            String company= Bytes.toString(value.getValue("cf1".getBytes(),"company".getBytes()));
            Double salary= Bytes.toInt(value.getValue("cf1".getBytes(),"salary".getBytes()))*1.0;
            k.set(company);
            v.set(salary);
            context.write(k,new CountWritable(1,salary,salary,salary));
        }
    }
    public static class UserCombiner extends Reducer<Text, CountWritable,Text, CountWritable>{
        @Override
        protected void reduce(Text key, Iterable<CountWritable> values, Context context) throws IOException, InterruptedException {
            int total=0;
            double tatalSalary=0.0;
            double avgSalary=0.0;
            double maxSalary=0.0;
            double minSalary=Integer.MAX_VALUE;
            for (CountWritable value : values) {
                tatalSalary+=value.getTatalSalary();
                total+=value.getTotal();
                if(minSalary>value.getMinSalary()){
                    minSalary=value.getMinSalary();
                }
                if(maxSalary<value.getMaxSalary()){
                    maxSalary=value.getMaxSalary();
                }
            }
            
            context.write(key,new CountWritable(total,tatalSalary,maxSalary,minSalary));
        }
    }
    public static class UserReducer extends TableReducer<Text, CountWritable, NullWritable>{
        @Override
        protected void reduce(Text key, Iterable<CountWritable> values, Context context) throws IOException, InterruptedException {
            int total=0;
            double tatalSalary=0.0;
            double avgSalary=0.0;
            double maxSalary=0.0;
            double minSalary=Integer.MAX_VALUE;
            for (CountWritable value : values) {
                tatalSalary+=value.getTatalSalary();
                total+=value.getTotal();
                if(minSalary>value.getMinSalary()){
                    minSalary=value.getMinSalary();
                }
                if(maxSalary<value.getMaxSalary()){
                    maxSalary=value.getMaxSalary();
                }
            }
            avgSalary=tatalSalary/total;

            Put put=new Put(key.getBytes());
            put.addColumn("cf1".getBytes(),"taotal".getBytes(),(total+"").getBytes());
            put.addColumn("cf1".getBytes(),"tatalSalary".getBytes(),(tatalSalary+"").getBytes());
            put.addColumn("cf1".getBytes(),"maxSalary".getBytes(),(maxSalary+"").getBytes());
            put.addColumn("cf1".getBytes(),"minSalary".getBytes(),(minSalary+"").getBytes());
            put.addColumn("cf1".getBytes(),"avgSalary".getBytes(),(avgSalary+"").getBytes());

            context.write(null,put);

        }
    }
}

public class CountWritable implements Writable {
    int total=0;
    double tatalSalary=0.0;
    double maxSalary=0.0;
    double minSalary=Integer.MAX_VALUE;

    public CountWritable(int total, double tatalSalary, double maxSalary, double minSalary) {
        this.total = total;
        this.tatalSalary = tatalSalary;
        this.maxSalary = maxSalary;
        this.minSalary = minSalary;
    }

    public CountWritable() {
    }

    public void write(DataOutput out) throws IOException {
        out.writeInt(total);
        out.writeDouble(tatalSalary);
        out.writeDouble(maxSalary);
        out.writeDouble(minSalary);
    }

    public void readFields(DataInput in) throws IOException {
        total=in.readInt();
        tatalSalary=in.readDouble();
        maxSalary=in.readDouble();
        minSalary=in.readDouble();
    }
    //....
}

HBase架构

HBase宏观架构

RegionServer架构

Region架构图

参考:http://www.blogjava.net/DLevin/archive/2015/08/22/426877.html

       http://www.blogjava.net/DLevin/archive/2015/08/22/426950.html

HBase集群构建

  • 确保HDFS正常运行(HDFS - HA)

  • 配置安装HBase

    [root@CentOSX ~]# tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/ [root@CentOSX ~]# vi /usr/hbase-1.2.4/conf/hbase-site.xml

    <property> <name>hbase.rootdir</name> <value>hdfs://mycluster/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>CentOSA,CentOSB,CentOSC</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> [root@centos ~]# vi /usr/hbase-1.2.4/conf/regionservers CentOSA CentOSB CentOSC [root@CentOSX ~]# vi .bashrc HBASE_MANAGES_ZK=false HBASE_HOME=/usr/hbase-1.2.4 HADOOP_HOME=/usr/hadoop-2.6.0 JAVA_HOME=/usr/java/latest CLASSPATH=. PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin export JAVA_HOME export CLASSPATH export PATH export HADOOP_HOME export HBASE_HOME export HBASE_MANAGES_ZK

    [root@CentOSX ~]# source .bashrc

  • 启动HBase [root@CentOSX ~]# hbase-daemon.sh start master [root@CentOSX ~]# hbase-daemon.sh start regionserver

转载于:https://my.oschina.net/u/3991887/blog/2874826

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值