- 本文基于 Centos6.x + CDH5.x
安装Phoenix 之前的组件都是通过CDH来安装的,但是这回就跟Cloudera完全没关系了。从 Apache Phoenix Download 下载Phoenix包,不过这个镜像有点慢,我把包上传到CSDN了,下载地址见下面的版本对应
Phoenix跟Hbase的版本对应 这里我用4.2.2,下载好后,解压开,把 phoenix-4.2.2-server.jar 拷贝到所有RegionServer的lib目录下 /usr/lib/hbase/lib
- cp phoenix-4.2.2-server.jar /usr/lib/hbase/lib
然后重启所有regionserver
- service hbase-regionserver restart
使用Phoenix 把 phoenix-4.2.2-bin.tar.gz 解压出来的 phoenix-4.2.2-bin 文件夹也上传到host1上,然后到bin目录下执行(其实在本机也可以,只是我本机没有Python环境而Centos天生有Python)
如果是windows下下载的,得先在centos上给bin文件夹里面的.py文件赋上执行权限
- [root@host1 ~]# cd phoenix-4.2.2-bin/
- [root@host1 phoenix-4.2.2-bin]# cd bin
- [root@host1 bin]# chmod +x *.py
phoenix可以用4种方式调用
- 批处理方式
- 命令行方式
- GUI方式
- JDBC调用
批处理方式 我们建立sql 名叫 us_population.sql 内容是
- CREATE TABLE IF NOT EXISTS us_population ( state CHAR(2) NOT NULL, city VARCHAR NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city));
建立一个文件 us_population.csv
- NY,New York,8143197
- CA,Los Angeles,3844829
- IL,Chicago,2842518
- TX,Houston,2016582
- PA,Philadelphia,1463281
- AZ,Phoenix,1461575
- TX,San Antonio,1256509
- CA,San Diego,1255540
- TX,Dallas,1213825
- CA,San Jose,912332
再创建一个文件 us_population_queries.sql
- SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum" FROM us_population GROUP BY state ORDER BY sum(population) DESC;
- phoenix-4.2.2-bin/bin/psql.py host1,host2:2181 us_population.sql us_population.csv us_population_queries.sql
这边记得把 host1 和 host2 换成你的zookeeper地址
这条命令你同时做了 创建一个表,插入数据,查询结果 三件事情
- [root@host1 ~]# phoenix-4.2.2-bin/bin/psql.py host1,host2:2181 us_population.sql us_population.csv us_population_queries.sql
- 15/03/04 17:14:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- 15/03/04 17:14:24 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-phoenix.properties,hadoop-metrics2.properties
- no rows upserted
- Time: 0.726 sec(s)
-
- csv columns from database.
- CSV Upsert complete. 10 rows upserted
- Time: 0.103 sec(s)
-
- St City Count Population Sum
- -- ---------------------------------------- ----------------------------------------
- NY 1 8143197
- CA 3 6012701
- TX 3 4486916
- IL 1 2842518
- PA 1 1463281
- AZ 1 1461575
- Time: 0.048 sec(s)
- hbase(main):002:0> scan 'US_POPULATION'
- ROW COLUMN+CELL
- AZPhoenix column=0:POPULATION, timestamp=1425460467206, value=\x80\x00\x00\x00\x00\x16MG
- AZPhoenix column=0:_0, timestamp=1425460467206, value=
- CALos Angeles column=0:POPULATION, timestamp=1425460467206, value=\x80\x00\x00\x00\x00:\xAA\xDD
- CALos Angeles column=0:_0, timestamp=1425460467206, value=
- CASan Diego column=0:POPULATION, timestamp=1425460467206, value=\x80\x00\x00\x00\x00\x13(t
- CASan Diego column=0:_0, timestamp=1425460467206, value=
- CASan Jose column=0:POPULATION, timestamp=1425460467206, value=\x80\x00\x00\x00\x00\x0D\xEB\xCC
- CASan Jose column=0:_0, timestamp=1425460467206, value=
- ILChicago column=0:POPULATION, timestamp=1425460467206, value=\x80\x00\x00\x00\x00+_\x96
- ILChicago column=0:_0, timestamp=1425460467206, value=
- NYNew York column=0:POPULATION, timestamp=1425460467206, value=\x80\x00\x00\x00\x00|A]
- NYNew York column=0:_0, timestamp=1425460467206, value=
- PAPhiladelphia column=0:POPULATION, timestamp=1425460467206, value=\x80\x00\x00\x00\x00\x16S\xF1
- PAPhiladelphia column=0:_0, timestamp=1425460467206, value=
- TXDallas column=0:POPULATION, timestamp=1425460467206, value=\x80\x00\x00\x00\x00\x12\x85\x81
- TXDallas column=0:_0, timestamp=1425460467206, value=
- TXHouston column=0:POPULATION, timestamp=1425460467206, value=\x80\x00\x00\x00\x00\x1E\xC5F
- TXHouston column=0:_0, timestamp=1425460467206, value=
- TXSan Antonio column=0:POPULATION, timestamp=1425460467206, value=\x80\x00\x00\x00\x00\x13,=
- TXSan Antonio column=0:_0, timestamp=1425460467206, value=
- 10 row(s) in 0.2220 seconds
- 之前定义的PRIMARY KEY 为 state, city ,于是Phoenix就把你输入的state 和 city的值拼起来成为rowkey
- 其他的字段还是按照列名去保存,默认的列簇为 0
- 还有一个0:_0 这个列是没有值的,这个是Phoenix处于性能方面考虑增加的一个列,不用管这个列
命令行方式 然后执行sqlline.py
- $ ./sqlline.py localhost
- 0: jdbc:phoenix:localhost>
退出命令行的方式是执行 !quit
- 0: jdbc:phoenix:localhost>!quit
- 0: jdbc:phoenix:localhost> help
- !all Execute the specified SQL against all the current connections
- !autocommit Set autocommit mode on or off
- !batch Start or execute a batch of statements
- !brief Set verbose mode off
- !call Execute a callable statement
- !close Close the current connection to the database
- !closeall Close all current open connections
- !columns List all the columns for the specified table
- !commit Commit the current transaction (if autocommit is off)
- !connect Open a new connection to the database.
- !dbinfo Give metadata information about the database
- !describe Describe a table
- !dropall Drop all tables in the current database
- !exportedkeys List all the exported keys for the specified table
- !go Select the current connection
- !help Print a summary of command usage
- !history Display the command history
- !importedkeys List all the imported keys for the specified table
- !indexes List all the indexes for the specified table
- !isolation Set the transaction isolation for this connection
- !list List the current connections
- !manual Display the SQLLine manual
- !metadata Obtain metadata information
- !nativesql Show the native SQL for the specified statement
- !outputformat Set the output format for displaying results
- (table,vertical,csv,tsv,xmlattrs,xmlelements)
- !primarykeys List all the primary keys for the specified table
- !procedures List all the procedures
- !properties Connect to the database specified in the properties file(s)
- !quit Exits the program
- !reconnect Reconnect to the database
- !record Record all output to the specified file
- !rehash Fetch table and column names for command completion
- !rollback Roll back the current transaction (if autocommit is off)
- !run Run a script from the specified file
- !save Save the current variabes and aliases
- !scan Scan for installed JDBC drivers
- !script Start saving a script to a file
- !set Set a sqlline variable
- !sql Execute a SQL command
- !tables List all the tables in the database
- !typeinfo Display the type map for the current connection
- !verbose Set verbose mode on
建立employee的映射表 数据准备 然后我们来建立一个映射表,映射我之前建立过的一个hbase表 employee
- hbase(main):003:0> describe 'employee'
- DESCRIPTION ENABLED
- 'employee', {NAME => 'company', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => true
- '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', I
- N_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'family', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLIC
- ATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => '
- false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
- 1 row(s) in 0.1120 seconds
- hbase(main):016:0> scan 'employee'
- ROW COLUMN+CELL
- row1 column=company:name, timestamp=1425537923391, value=ted
- row1 column=company:position, timestamp=1425537950471, value=worker
- row1 column=family:tel, timestamp=1425537956413, value=13600912345
- row2 column=family:tel, timestamp=1425537994087, value=18942245698
- row2 column=family:name, timestamp=1425537975610, value=michael
- row2 column=family:position, timestamp=1425537985594, value=manager
- 2 row(s) in 0.0340 seconds
有两条数据。如果没有这些数据的同学可以用以下命令创建
- create 'employee','company','family'
- put 'employee','row1','company:name','ted'
- put 'employee','row1','company:position','worker'
- put 'employee','row1','family:tel','13600912345'
- put 'employee','row2','company:name','michael'
- put 'employee','row2','company:position','manager'
- put 'employee','row2','family:tel','1894225698'
- scan 'employee'
你可以建立读写的表或者只读的表,他们的区别如下
- 读写表:如果你定义的列簇不存在,会被自动建立出来,并且赋以空值
- 只读表:你定义的列簇必须事先存在
- 0: jdbc:phoenix:localhost> CREATE TABLE IF NOT EXISTS "employee" ("no" CHAR(4) NOT NULL PRIMARY KEY, "company"."name" VARCHAR(30),"company"."position" VARCHAR(20), "family"."tel" CHAR(11), "family"."age" INTEGER);
- 2 rows affected (1.745 seconds)
- IF NOT EXISTS可以保证如果已经有建立过这个表,配置不会被覆盖
- 作为rowkey的字段用 PRIMARY KEY标定
- 列簇用 columnFamily.columnName 来表示
- family.age 是新增的字段,我之前建立测试数据的时候没有建立这个字段的原因是在hbase shell下无法直接写入数字型,等等我用UPSERT 命令插入数据的时候你就可以看到真正的数字型在hbase 下是如何显示的
- 0: jdbc:phoenix:localhost> SELECT * FROM "employee";
- +------+--------------------------------+----------------------+-------------+------------------------------------------+
- | no | name | position | tel | age |
- +------+--------------------------------+----------------------+-------------+------------------------------------------+
- | row1 | ted | worker | 13600912345 | null |
- | row2 | michael | manager | 1894225698 | null |
- +------+--------------------------------+----------------------+-------------+------------------------------------------+
插入/更改数据
插入或者更改数据在Phoenix里面是一个命令叫 UPSERT 意思是 update + insert
我们插入一条数据试试
- UPSERT INTO "employee" VALUES ('row3','billy','worker','16974681345',33);
- 0: jdbc:phoenix:localhost> SELECT * FROM "employee";
- +------+--------------------------------+----------------------+-------------+------------------------------------------+
- | no | name | position | tel | age |
- +------+--------------------------------+----------------------+-------------+------------------------------------------+
- | row1 | ted | worker | 13600912345 | null |
- | row2 | michael | manager | 1894225698 | null |
- | row3 | billy | worker | 16974681345 | 33 |
- +------+--------------------------------+----------------------+-------------+------------------------------------------+
- 3 rows selected (0.195 seconds)
我们去hbase里面看一下数据
- hbase(main):054:0> scan 'employee'
- ROW COLUMN+CELL
- row1 column=company:_0, timestamp=1425543735420, value=
- row1 column=company:name, timestamp=1425543735274, value=ted
- row1 column=company:position, timestamp=1425543735323, value=worker
- row1 column=family:tel, timestamp=1425543735420, value=13600912345
- row2 column=company:_0, timestamp=1425543735767, value=
- row2 column=company:name, timestamp=1425543735608, value=michael
- row2 column=company:position, timestamp=1425543735720, value=manager
- row2 column=family:tel, timestamp=1425543735767, value=1894225698
- row3 column=company:_0, timestamp=1425543857594, value=
- row3 column=company:name, timestamp=1425543857594, value=billy
- row3 column=company:position, timestamp=1425543857594, value=worker
- row3 column=family:age, timestamp=1425543857594, value=\x80\x00\x00!
- row3 column=family:tel, timestamp=1425543857594, value=16974681345
- 3 row(s) in 0.0650 seconds
最后那个 \x80\x00\x00! 就是数字型在hbase中序列化成了字节的存储形式
TABLE 跟 VIEW 的区别 建立VIEW的语句跟TABLE一样,就是把TABLE关键字换成VIEW而已,但是他们有本质的区别:
- 如果你有一张Hbase的表,并且以前已经有数据,最好建立VIEW,因为一旦建立了TABLE, Phoenix 会认为这张table完全是属于Phoenix的,就算你只是删除Phoenix里面的表,Hbase里面对应的表也会一起删除掉
- 如果你之前没有这张Hbase的表,你想建立全新的Phoenix表,就用Table,但是要记住所有的操作都要经过Phoenix,那张同时被建立的表,只是一张附属表,不要试图往里面手动维护数据,忘记它的存在
用GUI方式
GUI方式的安装方法在 http://phoenix.apache.org/installation.html 这边不讲了,因为我自己也没搞起来,而且那个界面实在太丑了,看了不忍心使用。
JDBC调用 打开Eclipse建立一个简单的Maven项目 play-phoenix
pom.xml的内容是
- <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
- <modelVersion>4.0.0</modelVersion>
-
- <groupId>org.crazycake</groupId>
- <artifactId>play-phoenix</artifactId>
- <version>0.0.1-SNAPSHOT</version>
- <packaging>jar</packaging>
-
- <name>play-phoenix</name>
- <url>http://maven.apache.org</url>
-
- <properties>
- <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
- </properties>
-
- <repositories>
- <repository>
- <id>apache release</id>
- <url>https://repository.apache.org/content/repositories/releases/</url>
- </repository>
- </repositories>
-
-
- <build>
- <plugins>
- <plugin>
- <artifactId>maven-compiler-plugin</artifactId>
- <version>2.0.2</version>
- <configuration>
- <source>1.7</source>
- <target>1.7</target>
- <encoding>UTF-8</encoding>
- <optimise>true</optimise>
- <compilerArgument>-nowarn</compilerArgument>
- </configuration>
- </plugin>
- <plugin>
- <groupId>org.apache.maven.plugins</groupId>
- <artifactId>maven-shade-plugin</artifactId>
- <version>2.3</version>
- <configuration>
- <transformers>
- <transformer
- implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer">
- </transformer>
- </transformers>
- </configuration>
- <executions>
- <execution>
- <phase>package</phase>
- <goals>
- <goal>shade</goal>
- </goals>
- </execution>
- </executions>
- </plugin>
- </plugins>
- </build>
- <dependencies>
- <dependency>
- <groupId>junit</groupId>
- <artifactId>junit</artifactId>
- <version>3.8.1</version>
- <scope>test</scope>
- </dependency>
- <dependency>
- <groupId>org.apache.phoenix</groupId>
- <artifactId>phoenix-core</artifactId>
- <version>4.2.2</version>
- </dependency>
- </dependencies>
- </project>
- <dependency>
- <groupId>org.apache.phoenix</groupId>
- <artifactId>phoenix-core</artifactId>
- <version>4.2.0</version>
- </dependency>
- package org.crazycake.play_phoenix;
-
- import java.sql.Connection;
- import java.sql.DriverManager;
- import java.sql.ResultSet;
- import java.sql.SQLException;
- import java.sql.Statement;
-
- public class PhoenixManager {
- public static void main(String[] args) throws SQLException {
- Connection conn = null;
- Statement stat = null;
- ResultSet rs = null;
- try {
- Class.forName("org.apache.phoenix.jdbc.PhoenixDriver");
- conn = DriverManager.getConnection("jdbc:phoenix:host1,host2:2181");
- stat = conn.createStatement();
- rs = stat.executeQuery("select * from \"employee\"");
- while(rs.next()){
- System.out.println("no: " + rs.getString("no"));
- System.out.println("name: " + rs.getString("name"));
- System.out.println("position: " + rs.getString("position"));
- System.out.println("age: " + rs.getInt("age"));
- }
- } catch (Throwable e) {
- e.printStackTrace();
- } finally{
- if(rs != null){
- rs.close();
- }
- if(stat != null){
- stat.close();
- }
- if(conn != null){
- conn.close();
- }
- }
- }
- }
- no: row1
- name: ted
- position: worker
- age: 0
- no: row2
- name: michael
- position: manager
- age: 0
- no: row3
- name: billy
- position: worker
- age: 33
搞定!
结语 至此所有Hadoop必学的组件已经完成,菜鸟课程已经完成!后续的非必学组件我就看心情更新了! :-)