Phoenix（SQL On HBase）安装和使用报告

最新推荐文章于 2024-08-22 10:09:18 发布

＠假装很文艺的文艺青年

最新推荐文章于 2024-08-22 10:09:18 发布

阅读量1k

点赞数 2

文章标签： hbase sql big data

本文链接：https://blog.csdn.net/weixin_43513980/article/details/105285519

版权

本文详细介绍了为何选择Phoenix作为HBase的SQL引擎，以及如何解决与CDH的兼容问题，包括编译CDH版本的Phoenix和安装到CDH环境中的步骤。此外，文章深入探讨了Phoenix的四种调用方式、数据操作和Schema操作，如创建表、插入、删除、更新数据，以及使用Spark操作Phoenix。Phoenix通过SQL优化提供了高效的查询性能，使得在HBase上的数据操作更加便捷。

摘要由CSDN通过智能技术生成

Phoenix（SQL On HBase）安装和使用报告

一、为什么使用Phoenix

Phoenix是一个HBase的开源SQL引擎。你可以使用标准的JDBC API代替HBase客户端API来创建表，插入数据，查询你的HBase数据。
Phoenix是构建在HBase之上的SQL引擎。你也许会存在“Phoenix是否会降低HBase的效率？”或者“Phoenix效率是否很低？”这样的疑虑，事实上并不会，Phoenix通过以下方式实现了比你自己手写的方式相同或者可能是更好的性能（更不用说可以少写了很多代码）：

编译你的SQL查询为原生HBase的scan语句
检测scan语句最佳的开始和结束的key
精心编排你的scan语句让他们并行执行
让计算去接近数据通过
推送你的WHERE子句的谓词到服务端过滤器处理
执行聚合查询通过服务端钩子（称为协同处理器）

除此之外，Phoenix还做了一些有趣的增强功能来更多地优化性能：

实现了二级索引来提升非主键字段查询的性能
统计相关数据来提高并行化水平，并帮助选择最佳优化方案
跳过扫描过滤器来优化IN，LIKE，OR查询
优化主键的来均匀分布写压力

以下是Phoenix对（吊）比（打）Hive、Impala的测试：

Phoenix VS Hive

Query: select count(1) from table over 10M and 100M rows. Data is 5 narrow columns. Number of Region Servers: 4 (HBase heap: 10GB, Processor: 6 cores @ 3.3GHz Xeon)
Phoenix vs Impala

Query: select count(1) from table over 1M and 5M rows. Data is 3 narrow columns. Number of Region Server: 1 (Virtual Machine, HBase heap: 2GB, Processor: 2 cores @ 3.3GHz Xeon)

目前有哪些公司在使用Phoenix？

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4U5rzI8f-1585877603709)(http://phoenix.apache.org/images/using/all.png)]

二、安装Phoenix

2.1 兼容问题？

首先，通过命令hbase version可以查看到

我们的HBase版本是1.2.0的。
接下来我们到 http://archive.apache.org/dist/phoenix 下载对应版本的安装包。这里我们选择apache-phoenix-4.8.1-HBase-1.2-bin.tar.gz 这个版本。

我们将下载的Phoenix压缩包上传到Master节点（任意一个都行）中，然后解压

我们可以看到里面包含了很多组件的jar包，我们只需要将phoenix-core-4.8.1-HBase-1.2.jar和phoenix-4.8.1-HBase-1.2-client.jar拷贝到HBase的lib目录下，然后将HBase的配置文件hbase-site.xml文件拷贝到Phoenix解压的目录下的bin目录。然后重启HBase
输入bin/sqlline.py bqdpm1,bqdpm2,bqdps1:2181。结果报错

很明显，这里提示包冲突了。我们回想一下，下载的phoenix的包原本是从apache的官方下载的，里面打包的是apache的hadoop和hbase，也就是说并不支持cdh。那应该怎么办呢？

2.2 编译CDH版本的Phoenix

由于Phoenix工程里面使用的依赖都是Apache原版的jar包，因此我们需要修改为CDH的依赖。可以参考编译phoenix用于CDH平台

修改了依赖过后还需要修改部分代码才行，这样就比较麻烦了。好在我们有万能的github，已经有大神帮我们做好了修改，可以直接拿下来用。链接：phoenix-for-cloudera。虽然工程上写着是CDH4.8，但是实际上4.7也能用。

将工程克隆下来或者直接批量下载下来，解压后可以看到如下目录

很明显，这是一个maven工程。在确定电脑安装了maven之后，使用命令mvn clean package -DskipTests -Dcdh.flume.version=1.4.0 。这里的flume版本需要指定为我们需要的Flume版本，CDH4.7中使用的是1.4。接下来就是漫长的等待。。。

最终，编译的jar包和工程的文件将会打包到phoenix-assembly/target中

2.3 安装Phoenix到CDH环境中

将打包好的phoenix-4.8.0-cdh5.8.0.tar.gz文件上传到CDH环境中，然后解压可以看到如下文件：

然后将phoenix-4.8.0-cdh5.8.0-server.jar文件拷贝到各个节点的HBase依赖路径下，即/opt/cloudera/parcels/CDH/lib/hbase/lib/

再将hbase的配置文件hbase-site.xml拷贝到bin目录下即可。

然后进入bin目录，执行./sqlline.py bqdpm1,bqdpm2,bqdps1:2181
看到如下信息说明成功

如果出现下面问题

则需要检查hdfs的权限控制是否关闭了。然后执行hbase clean --cleanZk
最后重启HBase即可

三、Phoenix的使用

3.1 phoenix的4种调用方式

3.1.1 批处理方式

首先创建us_population.sql文件，里面的创建一个名为us_population的表

CREATE TABLE IF NOT EXISTS us_population (      state CHAR(2) NOT NULL,      city VARCHAR NOT NULL,      population BIGINT      CONSTRAINT my_pk PRIMARY KEY (state, city));

接下来新建一个数据文件us_population.csv

NY,New York,8143197CA,Los Angeles,3844829IL,Chicago,2842518TX,Houston,2016582PA,Philadelphia,1463281AZ,Phoenix,1461575TX,San Antonio,1256509CA,San Diego,1255540TX,Dallas,1213825CA,San Jose,912332

最后创建一个查询sql文件us_population_queries.sql

SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum"FROM us_populationGROUP BY stateORDER BY sum(population) DESC;

执行../bin/psql.py bqdpm1,bqdpm2,bqdps1:2181 us_population.sql us_population.csv us_population_queries.sql

这里的命令中的us_population.sql和us_population.csv必须同名
其实，创建了表之后我们单独运行../bin/psql.py bqdpm1,bqdpm2,bqdps1:2181 us_population_queries.sql也是可以的

通过Phoenix建的表都会自动转成大写，如果需要使用小写的表，请使用create table "tablename"
安装了Phoenix之后就会存在四张系统表

在Phoenix中创建的表同时会在HBase中创建一张表与之对应

3.1.2 命令行方式

使用./sqlline.py bqdpm1,bqdpm2,bqdps1:2181登录到Phoenix的shell中，可以使用正常的SQL语句进行操作，

可以使用!table查看表信息
使用!describe tablename可以查看表字段信息
使用!history可以查看执行的历史SQL
使用!dbinfo可以查看Phoenix所有的属性配置

除了上面这些以外之外还有很多其他操作，可以用过help查看

0: jdbc:phoenix:bqdpm1,bqdpm2,bqdps1:2181> help!all                Execute the specified SQL against all the current connections!autocommit         Set autocommit mode on or off!batch              Start or execute a batch of statements!brief              Set verbose mode off!call               Execute a callable statement!close              Close the current connection to the database!closeall           Close all current open connections!columns            List all the columns for the specified table!commit             Commit the current transaction (if autocommit is off)!connect            Open a new connection to the database.!dbinfo             Give metadata information about the database!describe           Describe a table!dropall            Drop all tables in the current database!exportedkeys       List all the exported keys for the specified table!go                 Select the current connection!help               Print a summary of command usage!history            Display the command history!importedkeys       List all the imported keys for the specified table!indexes            List all the indexes for the specified table!isolation          Set the transaction isolation for this connection!list               List the current connections!manual             Display the SQLLine manual!metadata           Obtain metadata information!nativesql          Show the native SQL for the specified statement!outputformat       Set the output format for displaying results                    (table,vertical,csv,tsv,xmlattrs,xmlelements)!primarykeys        List all the primary keys for the specified table!procedures         List all the procedures!properties         Connect to the database specified in the properties file(s)!quit               Exits the program!reconnect          Reconnect to the database!record             Record all output to the specified file!rehash             Fetch table and column names for command completion!