数据从HBase导入到Hive,过程参考:
https://blog.csdn.net/wuxintdrh/article/details/78935597;
https://blog.csdn.net/dominic_tiger/article/details/70237542;
1.进入HBase:
[root@name01-test ~]# su hdfs
[hdfs@name01-test root]$ hbase shell
Usage: hbase [<options>] <command> [<args>]
Options:
--config DIR Configuration direction to use. Default: ./conf
--hosts HOSTS Override the list in 'regionservers' file
--auth-as-server Authenticate to ZooKeeper using servers configuration
Commands:
Some commands take arguments. Pass no args or -h for usage.
shell Run the HBase shell
hbck Run the hbase 'fsck' tool
snapshot Create a new snapshot of a table
snapshotinfo Tool for dumping snapshot information
wal Write-ahead-log analyzer
hfile Store file analyzer
zkcli Run the ZooKeeper shell
upgrade Upgrade hbase
master Run an HBase HMaster node
regionserver Run an HBase HRegionServer node
zookeeper Run a Zookeeper server
rest Run an HBase REST server
thrift Run the HBase Thrift server
thrift2 Run the HBase Thrift2 server
clean Run the HBase clean up script
classpath Dump hbase CLASSPATH
mapredcp Dump CLASSPATH entries required by mapreduce
pe Run PerformanceEvaluation
ltt Run LoadTestTool
version Print the version
CLASSNAME Run the class named CLASSNAME
[hdfs@name01-test root]$
只需输入[root@name01-test ~]# su hdfs
即可;
[hdfs@name01-test root]$ hbase shell
2.创建HBase表格
参考:https://www.cnblogs.com/tony-tang/p/6473393.html;
create 'userinfo', 'info'
hbase(main):020:0> put 'userinfo', '1', 'info:age', '23'
0 row(s) in 0.0120 seconds
hbase(main):030:0> put 'userinfo', '2', 'info:name', 'wsx'
0 row(s) in 0.0150 seconds
hbase(main):031:0> put 'userinfo', '3', 'info:name', 'chengbao'
0 row(s) in 0.0100 seconds
hbase(main):032:0> scan 'userinfo'
ROW COLUMN+CELL
1 column=info:age, timestamp=1531745740707, value=23
1 column=info:name, timestamp=1531745798480, value=chb1
1 column=info:sex, timestamp=1531745814210, value=male
2 column=info:name, timestamp=1531745886703, value=wsx
3 column=info:name, timestamp=1531745906320, value=chengbao
3 row(s) in 0.0130 seconds
这样就在HBase里新建了一个表格,这个表格需要从HBase转移到Hive当中;
3.创建Hive的映射表格
创建Hbase映射的Hive表
--key是hbase的rowkey, 各个字段是hbase中的quailiter
CREATE external TABLE hbase_table_1(key String, name string) -- 创建hive的表
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' -- 使用的类
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:name") -- 字段映射关系
TBLPROPERTIES ("hbase.table.name" = "userinfo"); --映射的表
4.查看Hive表的取值
此时查看Hive表格,发现其值已经和HBase中的表格一样。
hive> select * from hbase_table_1;
OK
1 chb1
2 wsx
3 chengbao
Time taken: 0.085 seconds, Fetched: 3 row(s)
hive>
5.更新HBase的值后,再观察Hive,发现其数值会连带动态更新
hbase(main):001:0> put 'userinfo', '4', 'info:name', 'mike'
0 row(s) in 0.3210 seconds
hbase(main):002:0> scan 'userinfo'
ROW COLUMN+CELL
1 column=info:age, timestamp=1531783708749, value=23
1 column=info:name, timestamp=1531783863243, value=chb1
1 column=info:sex, timestamp=1531783905927, value=male
2 column=info:name, timestamp=1531783929350, value=wsx
3 column=info:name, timestamp=1531783948542, value=chengbao
4 column=info:name, timestamp=1531784664541, value=mike
4 row(s) in 0.0600 seconds
hive> select * from hbase_ys;
OK
1 chb1
2 wsx
3 chengbao
4 mike
Time taken: 2.764 seconds, Fetched: 4 row(s)
这样就完成了数据从HBase到Hive的迁移。
总的来说,HBase语言和MySQL语言有如下不同:
1.HBase只有表的概念,没有库的概念;
2.语句中,表的名称要加引号,而MySQL和Hive则不需要加引号;
3.对于大小写的区别极其严格,标识符都要小写;
4.语句末尾不需要加引号;
5.删除表之前,一定要先将其disable处理;
6.当HBase SQL拼写错误时,删除语句方法为:Ctrl+Backspace;