61.Phoenix在CDH中的使用

61.1 演示环境介绍

  • CDH集群、HBase服务和Redhat7中的httpd服务正常运行
  • csv数据已准备
  • CDH版本:5.11.2
  • RedHat版本:7.2
  • Phoenix版本:4.7.0

61.2 操作演示

在CDH集群中安装Phoenix

  • Cloudera官网下载Phoenix的Parcel
    • 注意选择与操作系统匹配的版本,因为本次测试使用的是Redhat7,所以选择后缀名为el7的文件。下载地址为:
http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/
## 具体需要下载的三个文件地址为:
http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/CLABS_PHOENIX-4.7.0-1.clabs_phoenix1.3.0.p0.000-el7.parcel
http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/CLABS_PHOENIX-4.7.0-1.clabs_phoenix1.3.0.p0.000-el7.parcel.sha1
http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/manifest.json
  • 下载好的文件发布到httpd服务,可以用浏览器打开页面进行测试。
[ec2-user@ip-186-31-22-86 phoenix]$ pwd
/var/www/html/phoenix
[ec2-user@ip-186-31-22-86 phoenix]$ ll
total 192852
-rw-r--r-- 1 root root        41 Jun 24  2016 CLABS_PHOENIX-4.7.0-1.clabs_phoenix1.3.0.p0.000-el7.parcel.sha1
-rw-r--r-- 1 root root 197466534 Jun 24  2016 CLABS_PHOENIX-4.7.0-1.clabs_phoenix1.3.0.p0.000-el7.parcel
-rw-r--r-- 1 root root      4687 Jun 24  2016 manifest.json
[ec2-user@ip-186-31-22-86 phoenix]$ 
  • CM点击“Parcel”进入Parcel管理页面
    • 点“配置”,输入Phoenix的Parcel包http地址。
    • 保存更改,回到Parcel管理页面,CM已发现Phoenix的Parcel。
    • 下载->分配->激活
    • HBase服务需要部署客户端配置以及重启

在CDH集群中使用Phoenix

  • 基本操作
    • 进入Phoenix的脚本命令目录
[ec2-user@ip-186-31-22-86 bin]$ cd /opt/cloudera/parcels/CLABS_PHOENIX/bin
[ec2-user@ip-186-31-22-86 bin]$ ll
total 16
-rwxr-xr-x 1 root root 672 Jun 24  2016 phoenix-performance.py
-rwxr-xr-x 1 root root 665 Jun 24  2016 phoenix-psql.py
-rwxr-xr-x 1 root root 668 Jun 24  2016 phoenix-sqlline.py
-rwxr-xr-x 1 root root 674 Jun 24  2016 phoenix-utils.py
  • 使用Phoenix登录HBase
[ec2-user@ip-186-31-22-86 bin]$ ./phoenix-sqlline.py
Zookeeper not specified. 
Usage: sqlline.py <zookeeper> <optional_sql_file> 
Example: 
 1. sqlline.py localhost:2181:/hbase 
 2. sqlline.py localhost:2181:/hbase ../examples/stock_symbol.sql
  • 需要指定Zookeeper
[ec2-user@ip-186-31-22-86 bin]$ ./phoenix-sqlline.py ip-186-31-21-45:2181:/hbase
...
sqlline version 1.1.8
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> !tables
+------------+--------------+-------------+---------------+----------+------------+--------------------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_C |
+------------+--------------+-------------+---------------+----------+------------+--------------------+
|            | SYSTEM       | CATALOG     | SYSTEM TABLE  |          |            |                    |
|            | SYSTEM       | FUNCTION    | SYSTEM TABLE  |          |            |                    |
|            | SYSTEM       | SEQUENCE    | SYSTEM TABLE  |          |            |                    |
|            | SYSTEM       | STATS       | SYSTEM TABLE  |          |            |                    |
|            |              | ITEM        | TABLE         |          |            |                    |
+------------+--------------+-------------+---------------+----------+------------+--------------------+
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> 
  • 创建一张测试表
    • 建表必须指定主键。
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> create table hbase_test
. . . . . . . . . . . . . . . . . . . . . .> (
. . . . . . . . . . . . . . . . . . . . . .>     s1 varchar not null primary key,
. . . . . . . . . . . . . . . . . . . . . .>     s2 varchar,
. . . . . . . . . . . . . . . . . . . . . .>     s3 varchar,
. . . . . . . . . . . . . . . . . . . . . .>     s4 varchar
. . . . . . . . . . . . . . . . . . . . . .> );
No rows affected (1.504 seconds)
  • 在hbase shell中进行检查
    • 插入一行数据。
    • Phoenix中没有insert语法,用upsert代替。
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> upsert into hbase_test values('1','testname','testname1','testname2');
1 row affected (0.088 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> select * from hbase_test;
+-----+-----------+------------+------------+
| S1  |    S2     |     S3     |     S4     |
+-----+-----------+------------+------------+
| 1   | testname  | testname1  | testname2  |
+-----+-----------+------------+------------+
1 row selected (0.049 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> 
  • 在hbase shell中进行检查
    • 删除这行数据,delete测试
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> delete from hbase_test where s1='1';
1 row affected (0.018 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> select * from hbase_test;
+-----+-----+-----+-----+
| S1  | S2  | S3  | S4  |
+-----+-----+-----+-----+
+-----+-----+-----+-----+
No rows selected (0.045 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> 
  • 在hbase shell中进行检查。更新数据测试
    • Phoenix中没有update语法,用upsert代替。插入多条数据需要执行多条upsert语句,没办法将所有的数据都写到一个“values”后面。
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> upsert into hbase_test values('1','testname','testname1','testname2');
1 row affected (0.017 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> upsert into hbase_test values('2','testname','testname1','testname2');
1 row affected (0.007 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> upsert into hbase_test values('3','testname','testname1','testname2');
1 row affected (0.008 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> select * from hbase_test;
+-----+-----------+------------+------------+
| S1  |    S2     |     S3     |     S4     |
+-----+-----------+------------+------------+
| 1   | testname  | testname1  | testname2  |
| 2   | testname  | testname1  | testname2  |
| 3   | testname  | testname1  | testname2  |
+-----+-----------+------------+------------+
3 rows selected (0.067 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> upsert into hbase_test values('1','fayson','testname1','testname2');
1 row affected (0.009 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> select * from hbase_test;
+-----+-----------+------------+------------+
| S1  |    S2     |     S3     |     S4     |
+-----+-----------+------------+------------+
| 1   | fayson    | testname1  | testname2  |
| 2   | testname  | testname1  | testname2  |
| 3   | testname  | testname1  | testname2  |
+-----+-----------+------------+------------+
3 rows selected (0.037 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase>
  • 在hbase shell中进行检查
    • 批量更新测试,创建另外一张表hbase_test1,表结构与hbase_test一样,并插入五条,有两条是hbase_test中没有的(主键为4,5),有一条与hbase_test中的数据不一样(主键为1),有两条是完全一样(主键为2,3)。
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> create table hbase_test1
. . . . . . . . . . . . . . . . . . . . . .> (
. . . . . . . . . . . . . . . . . . . . . .>     s1 varchar not null primary key,
. . . . . . . . . . . . . . . . . . . . . .>     s2 varchar,
. . . . . . . . . . . . . . . . . . . . . .>     s3 varchar,
. . . . . . . . . . . . . . . . . . . . . .>     s4 varchar
. . . . . . . . . . . . . . . . . . . . . .> );
No rows affected (1.268 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> 
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> upsert into hbase_test1 values('1','fayson','testname1','testname2');
1 row affected (0.031 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> upsert into hbase_test1 values('2','testname','testname1','testname2');
1 row affected (0.006 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> upsert into hbase_test1 values('3','testname','testname1','testname2');
1 row affected (0.005 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> upsert into hbase_test1 values('4','testname','testname1','testname2');
1 row affected (0.005 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> upsert into hbase_test1 values('5','testname','testname1','testname2');
1 row affected (0.007 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> select * from hbase_test1;
+-----+-----------+------------+------------+
| S1  |    S2     |     S3     |     S4     |
+-----+-----------+------------+------------+
| 1   | fayson    | testname1  | testname2  |
| 2   | testname  | testname1  | testname2  |
| 3   | testname  | testname1  | testname2  |
| 4   | testname  | testname1  | testname2  |
| 5   | testname  | testname1  | testname2  |
+-----+-----------+------------+------------+
5 rows selected (0.038 seconds)
  • 批量更新,用hbase_test1中的数据去更新hbase_test。
    • 批量更新发现对于已有的数据,如果值不一样,会覆盖,对于相同的数据会保持不变,对于没有的数据会直接作为新的数据插入。
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> upsert into hbase_test select * from hbase_test1;
5 rows affected (0.03 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> select * from hbase_test;
+-----+-----------+------------+------------+
| S1  |    S2     |     S3     |     S4     |
+-----+-----------+------------+------------+
| 1   | fayson    | testname1  | testname2  |
| 2   | testname  | testname1  | testname2  |
| 3   | testname  | testname1  | testname2  |
| 4   | testname  | testname1  | testname2  |
| 5   | testname  | testname1  | testname2  |
+-----+-----------+------------+------------+
5 rows selected (0.039 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase>

使用Phoenix bulkload数据到HBase

  • 准备需要批量导入的测试数据,这里使用TPC_DS的item表数据。
[ec2-user@ip-186-31-22-86 ~]$ ll item.dat
-rw-r--r-- 1 root root 28855325 Oct  3 10:23 item.dat
[ec2-user@ip-186-31-22-86 ~]$ head -1 item.dat
1|AAAAAAAABAAAAAAA|1997-10-27||Powers will not get influences. Electoral ports should show low, annual chains. N
  • Phoenix的bulkload只能导入csv,所以先把该数据的分隔符修改为逗号,并且后缀名改为.csv
[ec2-user@ip-186-31-22-86 ~]$ sed -i 's/|/,/g' item.dat
[ec2-user@ip-186-31-22-86 ~]$ mv item.dat item.csv
[ec2-user@ip-186-31-22-86 ~]$ ll item.csv 
-rw-r--r-- 1 ec2-user ec2-user 28855325 Oct  3 10:26 item.csv
[ec2-user@ip-186-31-22-86 ~]$ head -1 item.csv 
1,AAAAAAAABAAAAAAA,1997-10-27,,Powers will not get influences. Electoral ports should show low, annual chains
  • 上传该文件到HDFS
[ec2-user@ip-186-31-22-86 ~]$ hadoop fs -mkdir /fayson
[ec2-user@ip-186-31-22-86 ~]$ hadoop fs -put item.csv /fayson
[ec2-user@ip-186-31-22-86 ~]$ hadoop fs -ls /fayson
Found 1 items
-rw-r--r--   3 ec2-user supergroup   28855325 2017-10-03 10:28 /fayson/item.csv
[ec2-user@ip-186-31-22-86 ~]$ 
  • 通过Phoenix创建item表,为了演示,这里只创建了4个字段
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> create table item
. . . . . . . . . . . . . . . . . . . . . .> (
. . . . . . . . . . . . . . . . . . . . . .>     i_item_sk varchar not null primary key,
. . . . . . . . . . . . . . . . . . . . . .>     i_item_id varchar,
. . . . . . . . . . . . . . . . . . . . . .>     i_rec_start_varchar varchar,
. . . . . . . . . . . . . . . . . . . . . .>     i_rec_end_date varchar
. . . . . . . . . . . . . . . . . . . . . .> );
No rows affected (1.268 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase>
  • 执行bulkload命令导入数据
[ec2-user@ip-186-31-22-86 ~]$ HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-1.2.0-cdh5.12.1.jar:/opt/cloudera/parcels/CDH/lib/hbase/conf hadoop jar /opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/phoenix-4.7.0-clabs-phoenix1.3.0-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool -t item -i /fayson/item.csv
17/10/03 10:32:24 INFO util.QueryUtil: Creating connection with the jdbc url: jdbc:phoenix:ip-186-31-21-45.ap-southeast-1.compute.internal,ip-186-31-22-86.ap-southeast-1.compute.internal,ip-186-31-26-102.ap-southeast-1.compute.internal:2181:/hbase;
...
17/10/03 10:32:24 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ip-186-31-21-45.ap-southeast-1.compute.internal:2181,ip-186-31-22-86.ap-southeast-1.compute.internal:2181,ip-186-31-26-102.ap-southeast-1.compute.internal:2181 sessionTimeout=60000 watcher=hconnection-0x7a9c0c6b0x0, quorum=ip-186-31-21-45.ap-southeast-1.compute.internal:2181,ip-186-31-22-86.ap-southeast-1.compute.internal:2181,ip-186-31-26-102.ap-southeast-1.compute.internal:2181, baseZNode=/hbase
17/10/03 10:32:24 INFO zookeeper.ClientCnxn: Opening socket connection to server ip-186-31-21-45.ap-southeast-1.compute.internal/186.31.21.45:2181. Will not attempt to authenticate using SASL (unknown error)
...
17/10/03 10:32:30 INFO mapreduce.Job: Running job: job_1507035313248_0001
17/10/03 10:32:38 INFO mapreduce.Job: Job job_1507035313248_0001 running in uber mode : false
17/10/03 10:32:38 INFO mapreduce.Job:  map 0% reduce 0%
17/10/03 10:32:52 INFO mapreduce.Job:  map 100% reduce 0%
17/10/03 10:33:01 INFO mapreduce.Job:  map 100% reduce 100%
17/10/03 10:33:01 INFO mapreduce.Job: Job job_1507035313248_0001 completed successfully
17/10/03 10:33:01 INFO mapreduce.Job: Counters: 50
...
17/10/03 10:33:01 INFO mapreduce.AbstractBulkLoadTool: Loading HFiles from /tmp/fef0045b-8a31-4d95-985a-bee08edf2cf9
...
  • 在Phoenix中查询该表
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> select * from item limit 10;
+------------+-------------------+----------------------+-----------------+
| I_ITEM_SK  |     I_ITEM_ID     | I_REC_START_VARCHAR  | I_REC_END_DATE  |
+------------+-------------------+----------------------+-----------------+
| 1          | AAAAAAAABAAAAAAA  | 1997-10-27           |                 |
| 10         | AAAAAAAAKAAAAAAA  | 1997-10-27           | 1999-10-27      |
| 100        | AAAAAAAAEGAAAAAA  | 1997-10-27           | 1999-10-27      |
| 1000       | AAAAAAAAIODAAAAA  | 1997-10-27           | 1999-10-27      |
| 10000      | AAAAAAAAABHCAAAA  | 1997-10-27           | 1999-10-27      |
| 100000     | AAAAAAAAAKGIBAAA  | 1997-10-27           | 1999-10-27      |
| 100001     | AAAAAAAAAKGIBAAA  | 1999-10-28           | 2001-10-26      |
| 100002     | AAAAAAAAAKGIBAAA  | 2001-10-27           |                 |
| 100003     | AAAAAAAADKGIBAAA  | 1997-10-27           |                 |
| 100004     | AAAAAAAAEKGIBAAA  | 1997-10-27           | 2000-10-26      |
+------------+-------------------+----------------------+-----------------+
10 rows selected (0.054 seconds)
0: jdbc:phoenix:ip-186-31-21-45:2181:/hbase> 
  • 在hbase shell中查询该表
hbase(main):002:0> scan 'ITEM', LIMIT => 10
ROW                         COLUMN+CELL                                                                 
 1                          column=0:I_ITEM_ID, timestamp=1507041176470, value=AAAAAAAABAAAAAAA         
 1                          column=0:I_REC_START_VARCHAR, timestamp=1507041176470, value=1997-10-27     
 1                          column=0:_0, timestamp=1507041176470, value=                                
 10                         column=0:I_ITEM_ID, timestamp=1507041176470, value=AAAAAAAAKAAAAAAA         
 10                         column=0:I_REC_END_DATE, timestamp=1507041176470, value=1999-10-27          
 10                         column=0:I_REC_START_VARCHAR, timestamp=1507041176470, value=1997-10-27     
 10                         column=0:_0, timestamp=1507041176470, value=                                
...
 100004                     column=0:I_REC_START_VARCHAR, timestamp=1507041176470, value=1997-10-27     
 100004                     column=0:_0, timestamp=1507041176470, value=                                
10 row(s) in 0.2360 seconds
  • 入库条数检查,如条数相等,全部入库成功。

使用Phoenix从HBase中导出数据到HDFS

  • Phoenix提供了使用MapReduce导出数据到HDFS的功能,以pig的脚本执行。首先准备pig脚本。
[ec2-user@ip-186-31-22-86 ~]$ cat export.pig 
REGISTER /opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/phoenix-4.7.0-clabs-phoenix1.3.0-client.jar;
rows = load 'hbase://query/SELECT * FROM ITEM' USING org.apache.phoenix.pig.PhoenixHBaseLoader('ip-186-31-21-45:2181');
STORE rows INTO 'fayson1' USING PigStorage(',');
[ec2-user@ip-186-31-22-86 ~]$
  • 执行该脚本
[ec2-user@ip-186-31-22-86 ~]$ pig -x mapreduce export.pig 
...
Counters:
Total records written : 102000
Total bytes written : 4068465
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1507035313248_0002

2017-10-03 10:45:38,905 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
  • 导出成功后检查HDFS中的数据
[ec2-user@ip-186-31-22-86 ~]$ hadoop fs -ls /user/ec2-user/fayson1
Found 2 items
-rw-r--r--   3 ec2-user supergroup          0 2017-10-03 10:45 /user/ec2-user/fayson1/_SUCCESS
-rw-r--r--   3 ec2-user supergroup    4068465 2017-10-03 10:45 /user/ec2-user/fayson1/part-m-00000
[ec2-user@ip-186-31-22-86 ~]$ hadoop fs -cat /user/ec2-user/fayson1/part-m-00000 | head -2
1,AAAAAAAABAAAAAAA,1997-10-27,
10,AAAAAAAAKAAAAAAA,1997-10-27,1999-10-27
cat: Unable to write to output stream.
[ec2-user@ip-186-31-22-86 ~]$ 
  • 检查条数与原始数据一致,则全部导出成功。

结论

  • 使用Cloudera提供的Phoenix Parcel,可以很方便的安装Phoenix。
  • 使用Phoenix可以对HBase进行建表,删除,更新等操作,都是以熟悉的SQL方式操作。
  • Phoenix提供了批量导入/导出数据的方式。
    • 批量导入只支持csv格式,分隔符为逗号。
  • Phoenix中的SQL操作,可以马上同步到HBase,通过hbase shell检查都成功。
  • 目前Cloudera官方提供的Phoenix版本较旧。
  • Phoenix提供的SQL语法较为简陋,没有insert/update,一律用upsert代替。
  • 使用upsert插入数据时,只能一条一条插入,没法将全部字段值写到一个“values”后面。

大数据视频推荐:
CSDN
大数据语音推荐:
企业级大数据技术应用
大数据机器学习案例之推荐系统
自然语言处理
大数据基础
人工智能:深度学习入门到精通

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值