HBase

最新推荐文章于 2024-05-28 22:54:24 发布

做一个勤劳的码农

最新推荐文章于 2024-05-28 22:54:24 发布

阅读量174

点赞数 1

分类专栏：大数据文章标签：大数据 Hbase

本文链接：https://blog.csdn.net/baidu_41766416/article/details/86097585

版权

大数据专栏收录该内容

15 篇文章 0 订阅

订阅专栏

HBase的表结构和体系结构

HBase是一个分布式的、面向列的开源数据库，该技术来源于 Fay Chang 所撰写的Google论文“Bigtable：一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统（File System）所提供的分布式数据存储一样，HBase在Hadoop之上提供了类似于Bigtable的能力。HBase是Apache的Hadoop项目的子项目。HBase不同于一般的关系数据库，它是一个适合于非结构化数据存储的数据库。另一个不同的是HBase基于列的而不是基于行的模式。

主从结构：

主：HMaster

从：RegionServer

搭建HBase环境

解压：tar -zxvf hbase-1.3.1-bin.tar.gz -C ~/training/
设置环境变量：
            HBASE_HOME=/root/training/hbase-1.3.1
			export HBASE_HOME

			PATH=$HBASE_HOME/bin:$PATH
			export PATH
source /etc/profile
一、本地模式:  不需要HDFS，保存在Linux的文件系统
    修改文件：hbase-env.sh
			export JAVA_HOME=/。。。。
    核心配置文件:  conf/hbase-site.xml    data目录需要提前创建
           <property>
			  <name>hbase.rootdir</name>
			  <value>file:///root/training/hbase-1.3.1/data</value>
			</property>
    启动HBase：start-hbase.sh   只有HMaster
二、伪分布模式（一台机器）
    修改文件：hbase-env.sh
            export JAVA_HOME=/。。。。
			HBASE_MANAGES_ZK	true  ---> 使用HBase自带的ZK
    配置文件:  conf/hbase-site.xml
           <!--HBase的数据保存在HDFS对应目录-->
			<property>
			  <name>hbase.rootdir</name>
			  <value>hdfs://192.168.157.111:9000/hbase</value>
			</property>	
			<!--是否是分布式环境-->
			<property>
			  <name>hbase.cluster.distributed</name>
			  <value>true</value>
			</property>			
			<!--配置ZK的地址-->
			<property>
			  <name>hbase.zookeeper.quorum</name>
			  <value>192.168.157.111</value>
			</property>				
			<!--冗余度-->
			<property>
			  <name>dfs.replication</name>
			  <value>1</value>
			</property>	
  文件regionservers：配置从节点地址
三、全分布模式（至少3台）
   在伪分布基础上添加配置  并复制到其他节点上
           <!--冗余度-->
		    <property>
			  <name>dfs.replication</name>
			  <value>2</value>
			</property>	
			<!--主节点和从节点允许的最大时间误差-->
			<property>
			  <name>hbase.master.maxclockskew</name>
			  <value>180000</value>
			</property>

HBase在ZK中保存的数据和HA

HA：在另一台机器上再手动启动一个HMaster
hbase-daemon.sh start master

没有配置环境变量启动集群

bin/hbase-daemon.sh start master
bin/hbase-daemon.sh start regionserver

启动终端

bin/hbase shell

ui界面

http://192.168.50.183:16010/master-status

操作HBase

命令行

1）创建表         create '表名','列族'
2）全表扫描       scan '表名'
                 rowkey:行键：唯一 不重复
                 timestamp:时间戳
                 cell:单元格 数据存放位置
                 column familly:列族，列族下包含多个列
                 column：列
3）向表中插入数据  put '表名'，'rowkey','列族：列名'，'值'
4）覆盖数据       在hbase中没有修改，但是可以覆盖只要保持rowkey,列族，列相同即可进行覆盖操作
5）筛选扫描       scan 'user',{STARTROW =>'101',STOPROW => '101'}
6）查看表结构     describe '表名'
7）变更表信息     alter '表名',{NAME => 'info',VERSIONS => '3'}
8)删除数据        根据rowkey删除    deleteall '表名'，'rowkey'
                 根据具体的列删除   delete '表名'，'rowkey','列族：列'
9)清空表          truncate '表名'
10)删除表         先 disable '表名'   后 drop '表名'
11)统计表中数据行数    count '表名'
12)查看指定rowkey值    get '表名'，'rowkey'
13)查看具体列值        get '表名','rowkey','列族：列'
14)查看当前有哪些表    list

Java API

private static void createTable() throws Exception {
	//制定zookpeeper地址，获取Hmaster地址
	//Zk记录的是Hmaste主机名，不是IP地址，需要在本机配置hosts映射关系
	Configuration configuration = new Configuration();
	configuration.set("hbase.zookeeper.quorum", "192.168.81.111");
	//创建hbse客户端
	HBaseAdmin client = new HBaseAdmin(configuration);
	//创建一个表
	HTableDescriptor htable = new HTableDescriptor(TableName.valueOf("student2"));
	//创建列族
	HColumnDescriptor hColumn = new HColumnDescriptor("info");
	HColumnDescriptor hColumn2 = new HColumnDescriptor("grade");
	
	//列族加入到表
	htable.addFamily(hColumn);
	htable.addFamily(hColumn2);
	
	client.createTable(htable);
	client.close();
}
private static void deleteTable() throws MasterNotRunningException, ZooKeeperConnectionException, IOException {
	Configuration configuration = new Configuration();
	configuration.set("hbase.zookeeper.quorum", "192.168.81.111");
	//创建客户端
	HBaseAdmin admin = new HBaseAdmin(configuration);
	admin.disableTable("student2");
	admin.deleteTable("student2");
	admin.close();
}
private static void insertRow() throws IOException {
	Configuration configuration = new Configuration();
	configuration.set("hbase.zookeeper.quorum", "192.168.81.111");
	HTable hTable = new HTable(configuration, "student");
	
	Put put = new Put(Bytes.toBytes("stu004"));
	put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("zhangsan"));
	put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("age"), Bytes.toBytes("24"));
	put.addColumn(Bytes.toBytes("grade"), Bytes.toBytes("chinese"), Bytes.toBytes("80"));
	put.addColumn(Bytes.toBytes("grade"), Bytes.toBytes("math"), Bytes.toBytes("90"));
	hTable.put(put );
	hTable.close();
}
private static void insertRows() throws IOException {
	Configuration configuration = new Configuration();
	configuration.set("hbase.zookeeper.quorum", "192.168.81.111");
	HTable hTable = new HTable(configuration, "student");
	List<Put> puts = new ArrayList<>();
	for (int i = 0; i < 3; i++) {
		Put put = new Put(Bytes.toBytes("stu005"+i));
		put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("lisi"+i));
		puts.add(put);
	}
	hTable.put(puts);
	hTable.close();
}
private static void getValue() throws IOException {
	Configuration configuration = new Configuration();
	configuration.set("hbase.zookeeper.quorum", "192.168.81.111");
	HTable hTable = new HTable(configuration, "student");
	Get get =  new Get(Bytes.toBytes("stu001"));
	Result result = hTable.get(get );
	String name = Bytes.toString(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"))) ;
	String age = Bytes.toString(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("age"))) ;
	System.out.println(name+"----------"+age);
	hTable.close();
}
private static void getScan() throws IOException {
	Configuration configuration = new Configuration();
	configuration.set("hbase.zookeeper.quorum", "192.168.81.111");
	HTable hTable = new HTable(configuration, "student");
	Scan scan = new Scan();
	ResultScanner resultScanner = hTable.getScanner(scan );
	for (Result result : resultScanner) {
		String name = Bytes.toString(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name")));
		String math = Bytes.toString(result.getValue(Bytes.toBytes("grade"), Bytes.toBytes("math")));
		System.out.println(name+"----"+math);
	}
	hTable.close();
}

数据保存及其查询

HBase中有两张特殊的Table，-ROOT-和.META.
1. .META. ：记录了用户创建的表的Region信息，.META.可以有多个regoin
2. -ROOT- ：记录了.META.表的Region信息，-ROOT-只有一个region
Zookeeper中记录了-ROOT-表的location
Client访问用户数据之前需要首先访问zookeeper，然后访问-ROOT-表，接着访问.META.表，最后才能找到用户数据的位置去访问。

数据查找：ROOT ---> META ----> Region

HBase的过滤器（Java程序）

列值过滤器
列名前缀过滤器
多个列名前缀过滤器
Rowkey过滤器

组合多个过滤器

private static void singleFilter() throws IOException {
	Configuration configuration = new Configuration();
	configuration.set("hbase.zookeeper.quorum", "192.168.81.111");
	HTable hTable = new HTable(configuration, "emp");
	Scan scan = new Scan();
    //列值过滤器
	Filter filter = new SingleColumnValueFilter(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), CompareOp.EQUAL, Bytes.toBytes("3000"));
	scan.setFilter(filter );
	ResultScanner resultScanner = hTable.getScanner(scan );
	for (Result result : resultScanner) {
		String ename = Bytes.toString(result.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
		String sal = Bytes.toString(result.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal")));
		System.out.println(ename+"----"+sal);
	}
	hTable.close();
}
//过滤器组合
private static void operateFilterList() throws IOException {
		Configuration configuration = new Configuration();
		configuration.set("hbase.zookeeper.quorum", "192.168.81.111");
		HTable hTable = new HTable(configuration, "emp");
		Scan scan =  new Scan();
		FilterList filterList =  new FilterList(Operator.MUST_PASS_ONE);
		Filter filter1 =  new SingleColumnValueFilter(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), CompareOp.EQUAL, Bytes.toBytes("1250"));
		Filter filter2 = new ColumnPrefixFilter(Bytes.toBytes("ename"));
		filterList.addFilter(filter1);
		filterList.addFilter(filter2);
		scan.setFilter(filterList);
		ResultScanner scanner = hTable.getScanner(scan );
		for (Result result : scanner) {
			String sal = Bytes.toString(result.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal")));
			String ename = Bytes.toString(result.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
			System.out.println(ename+ ": " +sal);
		}
		hTable.close();
	}

HBase上的MapReduce

public class WcHbaseMapper extends TableMapper<Text, IntWritable>{
	@Override
	protected void map(ImmutableBytesWritable key, Result value,Context context)
			throws IOException, InterruptedException {
		//put 'word','2','content:info','I love China'
		String info = Bytes.toString(value.getValue(Bytes.toBytes("content"), Bytes.toBytes("info")));
		String[] split = info.split(" ");
		for (String string : split) {
			context.write(new Text(string), new IntWritable(1));
		}
	}
}
public class WcHbaseReduce extends TableReducer<Text, IntWritable, ImmutableBytesWritable> {
	@Override
	protected void reduce(Text key, Iterable<IntWritable> value,Context context)
			throws IOException, InterruptedException {
		int total = 0;
		for (IntWritable count :value) {
			total = total + count.get();
		}
		Put put = new Put(Bytes.toBytes(key.toString()));
		put.addColumn(Bytes.toBytes("content"), Bytes.toBytes("info"), Bytes.toBytes(String.valueOf(total)));
		context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())), put);
	}
}
public class WcHbaseMain {
	public static void main(String[] args) throws Exception {
		Configuration configuration = new Configuration();
		configuration.set("hbase.zookeeper.quorum", "192.168.81.111");
		Job job = new Job(configuration);
		job.setJarByClass(WcHbaseMain.class);
		
		//指定数据采集 需要查询hbase的某一列，用scan完成
		Scan scan = new Scan();
		scan.addColumn(Bytes.toBytes("content"), Bytes.toBytes("info"));
		
		TableMapReduceUtil.initTableMapperJob("word", scan, WcHbaseMapper.class, Text.class, IntWritable.class, job);
		TableMapReduceUtil.initTableReducerJob("stat", WcHbaseReduce.class, job);
		job.waitForCompletion(true);
	}
}

Hbase优化方案

1)预分区设计
真正存储数据的是region要维护一个区间段的rowkey  startRow~endRowkey
-》手动设置预分区
  create 'user_p','info','partition',SPLITS => ['101','102','103','104']
    存在-∞ +∞
    第一个分区 -∞ ~ 101
    第二个分区 101~102
    第三个分区 102~103
    第四个分区 103~104
    第五个分区 104 ~ +∞
-》生成16进制序列预分区
    create 'user_p2','info','partition',{NUMREGIONS => 15,SPLITALGO => 'HexSt
ringSplit'}
-》按照文件中设置的规则设置预分区
    create 'user_p4','partition',SPLITS_FILE => 'splits.txt'
2）rowkey设计
    一条数据的唯一标识是rowkey,此rowkey存储在哪个分区取决于属于哪个预分区内。
    为什么要设计rowkey?数据倾斜
    为了防止出现数据倾斜
    （1）生成随机数/hash/散列值
        例如：rowkey是101 变成：dd21231dqwdqd123131d112131
                     102 变成：wqdqdq212131dqdwqwdqdw1d21
    (2)字符串反转
        2018120800011 1100080218102
        2018120800012 2100080218102
    (3)字符串拼接
        2018120800011_a12e
        2018120800012_odd12c
        101~105 105~100000
3）hbase优化
    （1）内存优化
        一般分配70%内存给Hbase的java堆
        不建议分配非常大的堆内存
        一般设置为 16~48G内存即可
        设置：export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
        注意：etc/hadoop下 hadoop-env.sh
     (2)基础优化
        -》优化DataNode
            最大文件打开数  hdfs-site.xml
                属性：dfs.datanode.max.transfer.threads
                默认值：4096 设置大于4096
-》优化延迟高的数据操作等待时间
    hdfs-site.xml
    属性：dfs.image.transfer.timeout
    默认：60000毫秒
    调大
-》数据写入效率
    压缩
    属性：mapreduce.map.output.compress
    值：org.apache.hadoop.io.compress.GzipCodec
-》优化Hstore的文件大小
    属性：hbase.hregion.max.filesize
    默认值：10GB
    调小

做一个勤劳的码农

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HBase

HBase的表结构和体系结构HBase是一个分布式的、面向列的开源数据库，该技术来源于 Fay Chang 所撰写的Google论文“Bigtable：一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统（File System）所提供的分布式数据存储一样，HBase在Hadoop之上提供了类似于Bigtable的能力。HBase是Apache的Hadoop项目的子...
复制链接

扫一扫