大数据实训笔记8：hbase

最新推荐文章于 2024-01-01 10:22:07 发布

Roslin_v

最新推荐文章于 2024-01-01 10:22:07 发布

阅读量485

点赞数

文章标签： hbase 大数据 hadoop mapreduce

本文链接：https://blog.csdn.net/roslin_v/article/details/125614453

版权

案例3：将表中数据通过自定义mapreduce放入hbase表中

HBase是一种分布式、可扩展、支持海量数据存储的NoSQL数据库。逻辑上，HBase 的数据模型同关系型数据库很类似，数据存储在一张表中，有行有列。但从HBase 的底层物理存储结构（K-V）来看，HBase 更像是一个 multi-dimensional map。

安装部署

安装部署hbase之前，需确保hadoop集群和zookeeper集群正常启动。

将下载好的安装包上传到hadoop101 /opt/software中。

//解压
[hadoop@hadoop101 software]$ tar -zxvf hbase-2.3.7-bin.tar.gz -C /opt/module/
//配置hbase-env.sh
[hadoop@hadoop101 software]$ cd /opt/module/hbase-2.3.7
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/hbase-env.sh

找到这两个地方并修改，JAVA_HOME要根据自己上传的路径配置。

export JAVA_HOME=/opt/module/jdk1.8.0_212
export HBASE_MANAGES_ZK=false

//配置hbase-site.xml
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/hbase-site.xml

<!--指定Region服务器共享的目录，用来持久存储hbase的数据，url必须完全正确-->
  <property>
     <name>hbase.rootdir</name>
     <value>hdfs://hadoop101:8020/HBase</value>
  </property>
  <property>
     <name>hbase.cluster.distributed</name>
     <value>true</value>
  </property>
  <!-- 0.98 后的新变动，
               之前版本没有.port,默认端口为 60000 -->
  <property>
     <name>hbase.master.port</name>
     <value>16000</value>
  </property>
  <!--配置zookeeper集群地址，不要指定znode路径-->
  <property>
     <name>hbase.zookeeper.quorum</name>
     <value>hadoop101,hadoop102,hadoop103</value>
  </property>
  <!--指定zookeeper数据存储目录-->
  <property>
     <name>hbase.zookeeper.property.dataDir</name>
     <value>/opt/module/zookeeper-3.5.7/zkData</value>
  </property>
  <!--本地文件的存放目录-->
  <property>
     <name>hbase.tmp.dir</name>
     <value>./tmp</value>
  </property>
  <property>
     <name>hbase.unsafe.stream.capability.enforce</name>
     <value>false</value>
  </property>
  <!--HMaster相关配置-->
  <property>
     <name>hbase.master.info.bindAddress</name>
     <value>hadoop101</value>
  </property>

//配置regionservers
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/regionservers

hadoop101
hadoop102
hadoop103

//软连接hadoop 配置文件到HBase
[hadoop@hadoop101 hbase-2.3.7]$ ln -s /opt/module/hadoop-3.1.3/etc/hadoop/core-site.xml /opt/module/hbase-2.3.7/conf/core-site.xml   
[hadoop@hadoop101 hbase-2.3.7]$ ln -s /opt/module/hadoop-3.1.3/etc/hadoop/hdfs-site.xml /opt/module/hbase-2.3.7/conf/hdfs-site.xml
//分发
[hadoop@hadoop101 hbase-2.3.7]$ cd /opt/module/
[hadoop@hadoop101 module]$ xsync hbase-2.3.7/

集群的启动和停止

仅在hadoop101上运行下列命令：

//启动
[hadoop@hadoop101 hbase-2.3.7]$ bin/start-hbase.sh
//停止
[hadoop@hadoop101 hbase-2.3.7]$ bin/stop-hbase.sh

正常启动后，我们可以到浏览器上访问http://hadoop101:16010。

如果浏览器拒绝访问

检查之前的配置是否正确，保证刚刚配置的hbase-site.xml和/hadoop-3.1.3/etc/core-site.xml中的路径一致（我统一配置为8020）。
关闭hbase, zookeeper, hadoop，再重新启动，一定要保证启动hbase之前，hadoop和zookeeper都正常启动了。
如果还无法解决，可以查看hbase的日志，查看具体错误，自行解决。

[hadoop@hadoop101 hbase-2.3.7]$ cd logs
[hadoop@hadoop101 logs]$ ll
//查看最近的log
[hadoop@hadoop101 logs]$ cat hbase-hadoop-master-hadoop101.log

//我的错误如下，提示zookeeper连接有问题，因此重启zookeeper即可
zookeeper.ClientCnxn: Opening socket connection to server hadoop101/192.168.120.101:2181. Will not attempt to authenticate using SASL (unknown error)

Shell操作

//进入hbase客户端命令行
[hadoop@hadoop101 hbase-2.3.7]$ bin/hbase shell

//查看帮助命令
hbase(main):001:0> help

表操作

//查看当前数据库中有哪些表
hbase(main):002:0> list
//创建表
hbase(main):003:0> create "student", "sinfo"
//查看表的详情信息
hbase(main):004:0> describe "student"

//修改将info列族中版本为3
hbase(main):005:0> alter "student",{NAME => 'sinfo', VERSIONS => '3'}
//查看
hbase(main):006:0> describe "student"

//删除表
hbase(main):007:0> disable "student"                                                                                                                                    
hbase(main):008:0> drop "student"
//直接drop会报错：Table student is enabled. Disable it first.

命名空间操作

//查看命名空间
hbase(main):009:0> list_namespace
//创建命名空间                                                                                                                                      
hbase(main):010:0> create_namespace "bigdata"
//在命名空间下创建表                                                                                                                                     
hbase(main):011:0> create "bigdata:student", "info"

我们可以来到浏览器上，能看到我们创建的student表。

//删除命名空间

//先关闭表
hbase(main):012:0> disable "bigdata:student"
//再删除表                                                                                                                                        
hbase(main):013:0> drop "bigdata:student"
//最后删除命名空间                                                                                                                                 
hbase(main):014:0> drop_namespace "bigdata"

数据操作

hbase(main):003:0> put 'bigdata:student','1001','info:name','Alice'                                                                                                                                       
hbase(main):004:0> put 'bigdata:student','1001','info:sex','F'                                                                                                                                  
hbase(main):005:0> put 'bigdata:student','1001','info:age','23'                                                                                                                                    
hbase(main):006:0> put 'bigdata:student','1002','info:name','Bob'                                                                                                                                      
hbase(main):007:0> put 'bigdata:student','1002','info:sex','M'                                                                                                                                     
hbase(main):008:0> put 'bigdata:student','1002','info:age','22'                                                                                                                                     
hbase(main):009:0> put 'bigdata:student','1003','info:name','Caroline'                                                                                                                                     
hbase(main):010:0> put 'bigdata:student','1003','info:sex','F'                                                                                                                                 
hbase(main):011:0> put 'bigdata:student','1003','info:age','24'

//查看表中数据
hbase(main):012:0> scan 'bigdata:student'
//统计表中数据
hbase(main):013:0> count 'bigdata:student'
//查看指定数据
hbase(main):014:0> get 'bigdata:student','1001'                                                                                                                                         
hbase(main):015:0> get 'bigdata:student','1001','info'                                                                                                                                      
hbase(main):016:0> get 'bigdata:student','1001','info:name'

//删除表中数据
hbase(main):017:0> delete 'bigdata:student','1001','info:name'
//查看是否删除                                                                                                                                          
hbase(main):018:0> scan 'bigdata:student'

//清空表中数据
hbase(main):019:0> truncate 'bigdata:student'
//查看是否清空                                                                                                                                      
hbase(main):020:0> scan 'bigdata:student'

API编程实现

上述操作也可以通过API编程实现。

环境准备

创建hbase_demo工程，并导入依赖pom.xml。

<!--自定义属性设置版本号-->
    <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <hbase-version>2.3.7</hbase-version>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <hadoop-version>3.1.3</hadoop-version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.8.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>${hbase-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>${hbase-version}</version>
        </dependency>
    </dependencies>

创建日志log4j.properties。

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

代码实现

public class HbaseAPI {
    //日志
    private static Logger logger=Logger.getLogger(HbaseAPI.class);
    private static Connection connection;
    private static Admin admin;
    //静态代码块
    static{
        try {
            Configuration conf = HBaseConfiguration.create();
            conf.set("hbase.zookeeper.quorum","hadoop101,hadoop102,hadoop103");
            //获取连接对象
            connection = ConnectionFactory.createConnection(conf);
            //获取管理员对象
            admin = connection.getAdmin();
        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            if(admin!=null){
                try {
                    admin.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    public static void main(String[] args) throws IOException {
        createNamespace("bigdata");
        listNamespace();
        System.out.println("bigdata:student exists: "+ isTableExist("bigdata:student"));
        createTable("bigdata:student", "sinfo");
        putData("bigdata:student","1001","sinfo","name","Alice");
        putData("bigdata:student","1001","sinfo","sex","F");
        putData("bigdata:student","1001","sinfo","age","23");
        putData("bigdata:student","1002","sinfo","name","Bob");
        putData("bigdata:student","1002","sinfo","sex","M");
        putData("bigdata:student","1002","sinfo","age","22");
        putData("bigdata:student","1003","sinfo","name","Caroline");
        putData("bigdata:student","1003","sinfo","sex","F");
        putData("bigdata:student","1003","sinfo","age","24");
        getData("bigdata:student","1001",null,null);
        getDataScan("bigdata:student","1003","sinfo",null);
        getCount("bigdata:student");
        deleteData("bigdata:student","1002","sinfo","sex");
        getData("bigdata:student","1002",null,null);
        deleteNamespace("bigdata","student");
    }

    //判断表是否存在
    public static boolean isTableExist(String tableName) throws IOException {
        return admin.tableExists(TableName.valueOf(tableName));
    }

    //创建表
    public static void createTable(String tableName,String... cfs) throws IOException {
        //判断列族是否存在
        if(cfs.length<=0){
            logger.info("列族不存在");
            return ;
        }
        //判断表是否存在
        if(isTableExist(tableName)){
            logger.info("表存在");
            return;
        }
        //创建表的描述器
        HTableDescriptor hTableDescriptor=new HTableDescriptor(TableName.valueOf(tableName));
        for (String cf : cfs) {
            //创建表的列描述器
            HColumnDescriptor hColumnDescriptor=new HColumnDescriptor(cf);
            //把列的描述器加入表的描述器中
            hTableDescriptor.addFamily(hColumnDescriptor);
        }
        //创建表
        admin.createTable(hTableDescriptor);
    }

    //删除表
    public static void deleteTable(String tableName) throws IOException {
        //判断表是否存在
        if(!isTableExist(tableName)){
            logger.info("表不存在");
            return ;
        }
        //关闭表
        admin.disableTable(TableName.valueOf(tableName));
        //删除表
        admin.deleteTable(TableName.valueOf(tableName));
    }

    //创建命名空间
    public static void createNamespace(String namespace){
        try {
            //创建命名空间的描述器
            NamespaceDescriptor descriptor=NamespaceDescriptor.create(namespace).build();
            //创建命名空间
            admin.createNamespace(descriptor);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    //查看命名空间
    public static void listNamespace() throws IOException {
        String[] namespaces = admin.listNamespaces();
        logger.info(Arrays.asList(namespaces));
    }

    //删除命名空间
    public static void deleteNamespace(String namespace,String tableName) throws IOException {
        if(isTableExist(namespace+":"+tableName)){
            //删除表
            deleteTable(namespace+":"+tableName);
        }
        //删除命名空间
        admin.deleteNamespace(namespace);
    }

    //向表中插入数据
    public static void putData(String tableName,String rowkey,String cfs,String cn,String value) throws IOException {
        //获取表
        Table table = connection.getTable(TableName.valueOf(tableName));
        //创建Put对象
        Put put=new Put(Bytes.toBytes(rowkey));
        //添加数据
        put.addColumn(Bytes.toBytes(cfs),Bytes.toBytes(cn),Bytes.toBytes(value));
        //插入数据
        table.put(put);
    }

    //获取表中数据
    public static void getData(String tableName,String rowkey,String cfs,String cn) throws IOException {
        //获取表
        Table table = connection.getTable(TableName.valueOf(tableName));
        //创建get对象
        Get get=new Get(Bytes.toBytes(rowkey));
        //获取数据
        Result result = table.get(get);
        for (Cell cell : result.rawCells()) {
            logger.info(Bytes.toString(cell.getRowArray())+"\t"+
                    Bytes.toString(cell.getFamilyArray())+"\t"+
                    Bytes.toString(cell.getQualifierArray())+"\t"+
                    Bytes.toString(cell.getValueArray()));
        }
        //释放资源
        table.close();
    }

    //查询表中数据
    public static void getDataScan(String tableName,String rowkey,String cfs,String cn) throws IOException {
        //获取表
        Table table = connection.getTable(TableName.valueOf(tableName));
        //创建Scan对象
        Scan scan=new Scan(Bytes.toBytes(rowkey));
        scan.addFamily(Bytes.toBytes(cfs));
        //查询表
        ResultScanner scanner = table.getScanner(scan);
        for (Result result : scanner) {
            for (Cell cell : result.rawCells()) {
                logger.info(Bytes.toString(cell.getRowArray())+"\t"+
                        Bytes.toString(cell.getFamilyArray())+"\t"+
                        Bytes.toString(cell.getQualifierArray())+"\t"+
                        Bytes.toString(cell.getValueArray()));
            }
        }
        //释放资源
        table.close();
    }

    //统计表中数据
    public static void getCount(String tableName) throws IOException {
        //获取表
        Table table = connection.getTable(TableName.valueOf(tableName));
        //创建Scan对象
        Scan scan=new Scan();
        scan.setFilter(new FirstKeyOnlyFilter());
        //查询表
        ResultScanner scanner = table.getScanner(scan);
        //统计共有几条数据
        long count=0;
        for (Result result : scanner) {
            count+=result.size();
        }
        logger.info(tableName+"共有:"+count+"条数据");
        //释放资源
        table.close();
    }

    //删除表中数据
    public static void deleteData(String tableName,String rowkey,String cfs,String cn) throws IOException {
        //获取表
        Table table = connection.getTable(TableName.valueOf(tableName));
        //创建Delete对象
        Delete delete=new Delete(Bytes.toBytes(rowkey));
        delete.addColumn(Bytes.toBytes(cfs),Bytes.toBytes(cn));
        //删除表中数据
        table.delete(delete);
    }

}

执行效果

hbase与mapreduce集成

环境配置

//配置环境变量
[hadoop@hadoop101 hbase-2.3.7]$ sudo vim /etc/profile.d/my_env.sh

##HBASE_HOME
export HBASE_HOME=/opt/module/hbase-2.3.7
export PATH=$PATH:$HBASE_HOME/bin

//刷新
[hadoop@hadoop101 hbase-2.3.7]$ source /etc/profile
//分发
[hadoop@hadoop101 hbase-2.3.7]$ sudo /home/hadoop/bin/xsync /etc/profile.d/my_env.sh

//配置
[hadoop@hadoop101 hbase-2.3.7]$ cd /opt/module/hadoop-3.1.3
[hadoop@hadoop101 hadoop-3.1.3]$ vim etc/hadoop/hadoop-env.sh

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/module/hbase-2.3.7/lib/*

//分发
[hadoop@hadoop101 hadoop-3.1.3]$ xsync etc/hadoop/hadoop-env.sh

//重新启动hbase和hadoop集群
[hadoop@hadoop101 hbase-2.3.7]$ bin/stop-hbase.sh
[hadoop@hadoop101 hbase-2.3.7]$ myhadoop.sh stop
[hadoop@hadoop101 hbase-2.3.7]$ myhadoop.sh start
[hadoop@hadoop101 hbase-2.3.7]$ bin/start-hbase.sh

案例1：统计hbase表中数据

//进入shell命令
[hadoop@hadoop101 hbase-2.3.7]$ bin/hbase shell

//创建命名空间
hbase(main):001:0> create_namespace "bigdata"                                                                                                                                       
//创建表
hbase(main):002:0> create "bigdata:student", "sinfo"
//插入数据
hbase(main):003:0> put 'bigdata:student','1001','sinfo:name','Alice'                                                                                                                                       
hbase(main):004:0> put 'bigdata:student','1001','sinfo:sex','F'                                                                                                                                        
hbase(main):005:0> put 'bigdata:student','1001','sinfo:age','23'                                                                                                                                     
hbase(main):006:0> put 'bigdata:student','1002','sinfo:name','Bob'                                                                                                                                     
hbase(main):007:0> put 'bigdata:student','1002','sinfo:sex','M'                                                                                                                                  
hbase(main):008:0> put 'bigdata:student','1002','sinfo:age','22'                                                                                                                                  
hbase(main):009:0> put 'bigdata:student','1003','sinfo:name','Caroline'                                                                                                                                   
hbase(main):010:0> put 'bigdata:student','1003','sinfo:sex','F'                                                                                                                                 
hbase(main):011:0> put 'bigdata:student','1003','sinfo:age','24'      
//查看数据                                                                                                                             
hbase(main):012:0> scan "bigdata:student"
//退出
hbase(main):013:0> quit

//统计student表中数据数量
[hadoop@hadoop101 hbase-2.3.7]$ /opt/module/hadoop-3.1.3/bin/yarn jar lib/hbase-mapreduce-2.3.7.jar rowcounter bigdata:student

案例2：将本地数据存入hbase表

//创建本地数据
[hadoop@hadoop101 hbase-2.3.7]$ vim fruit.tsv

注意，文件中是以\t（tab）为分隔符的。

1001    Apple   Red
1002    Pear    Yellow
1003    Pineapple       Yellow

//把fruit.tsv文件上传到hdfs中的/fruit文件夹下
[hadoop@hadoop101 hbase-2.3.7]$ hdfs dfs -mkdir -p /fruit
[hadoop@hadoop101 hbase-2.3.7]$ hdfs dfs -put fruit.tsv /fruit

我们可以到http://hadoop101:9870 上查看。

//在hbase中创建fruit表
hbase(main):001:0> create "bigdata:fruit", "info"

//将本地数据存入hbase表中
[hadoop@hadoop101 hbase-2.3.7]$ /opt/module/hadoop-3.1.3/bin/yarn jar lib/hbase-mapreduce-2.3.7.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color bigdata:fruit hdfs://hadoop101:8020/fruit/fruit.tsv

我们再来到hbase中查看，存入成功。

案例3：将表中数据通过自定义mapreduce放入hbase表中

需求：将以下 fruit 表中的一部分数据，通过 mapreduce 迁入到 hbase 表中。

在API编程实现的基础上，在pom.xml中导入如下依赖：

<!--hbase与mapreduce依赖-->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-mapreduce</artifactId>
            <version>${hbase-version}</version>
        </dependency>

这里我们介绍两种实现方式。

//第一种实现方式
public class HdfsToHbseMr1 {
    //Mapper阶段
    public static class FruitToMrMapper extends Mapper<LongWritable, Text,LongWritable,Text> {
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            //写出
            context.write(key,value);
        }
    }
    //Reducer阶段
    public static class FruitToMrReducer extends TableReducer<LongWritable,Text, NullWritable>{
        @Override
        protected void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
            for (Text value : values) {
                String[] split = value.toString().split("\t");
                //创建Put对象
                Put put=new Put(Bytes.toBytes(split[0]));
                //给put赋值
                put.addColumn(Bytes.toBytes("info"),Bytes.toBytes("name"),Bytes.toBytes(split[1]));
                put.addColumn(Bytes.toBytes("info"),Bytes.toBytes("color"),Bytes.toBytes(split[2]));
                //写出去
                context.write(NullWritable.get(),put);
            }
        }
    }
    //Driver阶段
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //创建配置对象
        Configuration conf=new Configuration();
        //获取job对象
        Job job = Job.getInstance(conf);
        //设置jar位置
        job.setJarByClass(HdfsToHbseMr1.class);
        //设置Mapper
        job.setMapperClass(FruitToMrMapper.class);
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(Text.class);
        //设置Reducer
        TableMapReduceUtil.initTableReducerJob(
                args[1],
                FruitToMrReducer.class,
                job
        );
        //设置输入路径
        FileInputFormat.setInputPaths(job,new Path(args[0]));
        //提交
        boolean result = job.waitForCompletion(true);
        System.exit(result?0:1);
    }
}

//第二种实现方式
public class HdfsToHbseMr2 {
    //Mapper阶段
    public static class FruitMr2HbaseMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            //获取hdfs中一行数据:1001 Apple  Red
            String line=new String(value.getBytes(),0,value.getLength(),"UTF-8");
            //切割
            String[] split = line.split("\t");
            //创建对象ImmutableBytesWritable
            ImmutableBytesWritable k=new ImmutableBytesWritable(Bytes.toBytes(split[0]));
            //创建Put对象
            Put v=new Put(Bytes.toBytes(split[0]));
            //给put赋值
            v.addColumn(Bytes.toBytes("info"),Bytes.toBytes("name"),Bytes.toBytes(split[1]));
            v.addColumn(Bytes.toBytes("info"),Bytes.toBytes("color"),Bytes.toBytes(split[2]));
            //输出
            context.write(k,v);
        }
    }
    //Reducer阶段
    public static class FruitMr2HbaseReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable>{
        @Override
        protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {
            for (Put put : values) {
                context.write(NullWritable.get(),put);
            }
        }
    }
    //Driver阶段
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //创建配置对象
        Configuration conf=new Configuration();
        //获取job对象
        Job job = Job.getInstance(conf);
        //设置jar位置
        job.setJarByClass(HdfsToHbseMr2.class);
        //设置Mapper
        job.setMapperClass(FruitMr2HbaseMapper.class);
        job.setMapOutputKeyClass(ImmutableBytesWritable.class);
        job.setMapOutputValueClass(Put.class);
        //设置Reducer
        TableMapReduceUtil.initTableReducerJob(
                args[1],
                FruitMr2HbaseReducer.class,
                job
        );
        //设置输入路径
        FileInputFormat.setInputPaths(job,new Path(args[0]));
        //提交
        boolean result = job.waitForCompletion(true);
        System.exit(result?0:1);
    }
}

打包后上传到hbase-2.3.7目录下。在pom.xml中导入打包插件，注意更改类所在的位置噢！

 <build>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.6.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin </artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>com.hbase.HdfsToHbseMr2</mainClass>
                        </manifest>
                    </archive>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

执行：

[hadoop@hadoop101 hbase-2.3.7]$ yarn jar hdfstohbase1.jar com.hbase.HdfsToHbseMr1 /fruit/fruit.tsv bigdata:fruit1

案例4：查询数据并插入新表

需求：把bigdata:fruit中数据有关name列查询出来存入到hbase中的bigdata:fruit3表中

public class HbaseMrHbase {
    //Mapper阶段
    public static class FruitHbaseMrMapper extends TableMapper<ImmutableBytesWritable, Put> {
        @Override
        protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
            //创建Put对象
            Put v=new Put(key.get());
            for (Cell cell : value.rawCells()) {
                //判断当前的cell是否是name列限定符
                if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
                    //添加name的值
                    v.add(cell);
                }
            }
            //输出
            context.write(key,v);
        }
    }
    //Reducer阶段
    public static class FruitHbaseMrReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable>{
        @Override
        protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {
            for (Put put : values) {
                context.write(NullWritable.get(),put);
            }
        }
    }
    //Driver阶段
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //创建配置对象
        Configuration conf=new Configuration();
        //获取job对象
        Job job = Job.getInstance(conf);
        //设置jar位置
        job.setJarByClass(HbaseMrHbase.class);
        //设置Mapper
        TableMapReduceUtil.initTableMapperJob(
                args[0],
                new Scan(),
                FruitHbaseMrMapper.class,
                ImmutableBytesWritable.class,
                Put.class,
                job
        );
        //设置Reducer
        TableMapReduceUtil.initTableReducerJob(
                args[1],
                FruitHbaseMrReducer.class,
                job
        );
        //提交
        boolean result = job.waitForCompletion(true);
        System.exit(result?0:1);
    }
}

hbase优化

高可用

在 HBase 中 HMaster 负责监控 HRegionServer 的生命周期，均衡 RegionServer 的负载，如果 HMaster 挂掉了，那么整个 HBase 集群将陷入不健康的状态，并且此时的工作状态并不会维持太久。所以 HBase 支持对 HMaster 的高可用配置。

//关闭hbase
[hadoop@hadoop101 hbase-2.3.7]$ bin/stop-hbase.sh
//在 conf 目录下创建 backup-masters 文件，注意backup-masters不能改变
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/backup-masters

hadoop102
hadoop103

//分发
[hadoop@hadoop101 hbase-2.3.7]$ xsync conf/backup-masters
//重启
[hadoop@hadoop101 hbase-2.3.7]$ bin/start-hbase.sh

查看进程，启动成功。

预分区

每一个 region 维护着 StartRow 与 EndRow，如果加入的数据符合某个 Region 维护的RowKey 范围，则该数据交给这个 Region 维护。那么依照这个原则，我们可以将数据所要投放的分区提前大致的规划好，以提高 HBase 性能。

//手动设定预分区
hbase(main):001:0> create "bigdata:jk","sinfo","partition1",SPLITS=>['1000','2000','3000','4000']
//生成 16 进制序列预分区
hbase(main):002:0> create "bigdata:jk2","sinfo",'partition2',{NUMREGIONS=>15,SPLITALGO=>'HexStringSplit'}

//创建文件实现预分区
[hadoop@hadoop101 hbase-2.3.7]$ vim splits.txt

aa
bb
cc
dd

hbase(main):001:0> create "bigdata:jk3","sinfo",'partition4',SPLITS_FILE => 'splits.txt'

我们可以来到浏览器上查看：

统一时间

在三台客户机下分别运行下列命令：

[hadoop@hadoop101 hbase-2.3.7]$ sudo ntpdate time.nist.gov

Roslin_v

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
大数据实训笔记8：hbase

介绍了hbase的安装部署及集群的启动和停止，演示了hbase的基本shell操作和API应用。介绍了hbase和mapreduce集成的环境配置与几个案例。最后介绍了hbase优化。
复制链接

扫一扫