最近需要搞这方面的工作,首先百度了一下,发现重复率太高,于是自己去官网查看了,附上官网地址
这个人应该最开始发表文章的,写的不错
http://lxw1234.com/archives/category/hive
这段话翻译一下 注意时间是2016年2月就有了
1、hive2.0.0 实现了hpl/sql功能,实际上就是类似于mysql、oracle的存储过程
2、hpl/sql很牛b
一、commandline
hplsql -e 'query' | -f file
[-main procname]
[-d | --define | -hiveconf | -hivevar var=value ...]
[-version | --version]
[-trace | --trace]
[-H | --help]
Parameters:
Parameter | Description | HPL/SQL Version | |
-e 'query' | SQL statements to execute | 0.1 | |
-f file | Execute SQL statements from file | 0.1 | |
-main procname | Entry point (procedure or function name) | 0.3.7 | |
-d var=value | Variable definition | 0.1 | |
--define var=value | 0.1 | ||
-hiveconf var=value | 0.1 | ||
-hivevar var=value | 0.1 | ||
-version | --version | Print HPL/SQL version | 0.1 |
-trace | --trace | Print trace information | 0.1 |
-H | --help | Print help information and exit | 0.1 |
进入hive的bin安装目录就可以看到hplsql这个脚本,版本是cdh6.3.0,HPL/SQL 0.3.31
Notes:
-
-e and -f cannot be specified together
-
if -main option is not specified, HPL/SQL starts executing all statements from the beginning of the script
-
You can use single 'query' and double “query” quotes with -e option
-
Currently -d, --define, -hivevar and -hiveconf are equivalent and allow you to define input variables.
翻译下
1、-e 和-f 不能同时使用
2、如果-main选项没有指定,hplsql也就是你写的存储过程将执行所有的内容
3、-e的时候,你可以使用单引号和双引号
4、-d, --define, -hivevar 和-hiveconf都可以用来传递参数
Example 1:
Executing HPL/SQL statements from a script:
hplsql -f script.sql
Example 2:
Executing HPL/SQL statements from command line:
hplsql -e "NVL(MAX_PARTITION_DATE(db.sales, local_dt, code='A'), CURRENT_DATE)"
这句话暂时还不太理解,这个函数目前没看到,而且里面的参数也不明白,这个官方示例还不太懂
Example 3:
Using variables:
hplsql -e "PRINT a || ', ' || b" -d a=Hello -d b=world
Result:
Hello, world
就目前来说./hplsql 和./hive的功能基本一样
——————————————————————————————————————————————————————
下面主要学习存储过程基础、写法和使用
Operators
操作符
Expression | Result | Result Type |
3 + 1 | 4 | Integer |
'Ab' + 'c' | 'Abc' | String |
DATE '2014-12-31' + 1 | DATE '2015-01-01' | DATE |
3 = 3 | True | Boolean |
3 = 1 | False | Boolean |
'CA' = NULL | NULL | Boolean |
3 == 3 | True | Boolean |
3 <> 3 | False | Boolean |
3 != 3 | False | Boolean |
3 > 1 | True | Boolean |
3 < 1 | False | Boolean |
3 >= 1 | True | Boolean |
3 <= 1 | False | Boolean |
'a' || 'b' || 'c' | 'abc' | String |
'a' || 1 || 'c' | 'a1c' | String |
'a' || NULL || 'c' | 'ac' | String |
NULL || NULL | NULL | String |
许多示例使用
./hplsql -e "PRINT a || ', ' || b" --hivevar a=Hello --hivevar b=lxw1234.com
也没人说使用原因,hplsql中 a b是字符串,然后用||连接符连接 ', '注意这个,是字符串,所以一定要引起来
使用位置:
比如我现在用hplsql写个脚本,每天把a表的数据插入到b表
vim test.sh
hive -e 'insert into tableA as select * from impala.'||tableB||' limit 10';
—————————————————————————————————————————————————————————
Attributes and Built-in Variables
HPL/SQL attributes and built-in variables:
ACTIVITY_COUNT | Number of rows affected by last SQL statement //是否有数据,有为1没有为0 |
ERRORCODE | Return code of the last SQL statement //上句sql写的对不对 0正确,正数warning 负数error |
HOSTCODE | Return code of the last OS command // |
SQLCODE | Return code of the last SQL statement //0正确,正数warning 负数error |
SQLSTATE | Return status of the last SQL statement //算了感觉都差不多 |
这个的具体作用主要就是判断上一句sql 的查询插入删除等情况是否正常运行,从而来输出日志,同时判断下一步走向
————————————————————————————————————————————————————————
Statements
HPL/SQL statements:
ALLOCATE CURSOR | Allocate cursor for procedure result set | ||
ASSOCIATE RESULT SET LOCATOR | Define locators for procedure result sets | ||
BREAK | Exit a loop | ||
CALL | Execute a stored procedure | ||
CLOSE | Close a cursor | ||
CMP | Compare data in tables | ||
COPY | Copy data between tables and files | ||
COPY FROM FTP | Copy FTP files to Hadoop compatible file system | ||
COPY FROM LOCAL | Copy local files to Hadoop compatible file system | ||
CREATE DATABASE | Create a database | ||
CREATE FUNCTION | Create a user-defined SQL function | ||
CREATE LOCAL TEMPORARY TABLE | Create a session-level temporary table | ||
CREATE PACKAGE | Create a program package | ||
CREATE PROCEDURE | Create a user-defined SQL procedure | ||
CREATE TABLE | Create a table | ||
CREATE VOLATILE TABLE | Create a session-level temporary table | ||
DECLARE | Declare a variable | ||
DECLARE CONDITION | Declare a condition | ||
DECLARE CURSOR | Declare a cursor | ||
DECLARE HANDLER | Declare a condition handler | ||
DECLARE TEMPORARY TABLE | Declare a temporary table | ||
DESCRIBE | Describe a database object | ||
DROP DATABASE | Drop a database | ||
DROP TABLE | Drop a table | ||
EXEC | EXECUTE | EXECUTE IMMEDIATE | Execute a dynamic SQL statement or procedure |
EXIT WHEN | Exit a loop | ||
FETCH | Fetch the next row from a cursor | ||
FOR cursor | FOR statement (Cursor loop) | ||
FOR range | FOR statement (Integer range) | ||
GET DIAGNOSTICS | Get execution information | ||
HOST | Execute an OS command or run an external process | ||
IF | IF statement | ||
INCLUDE | Include statements from another script | ||
INSERT | INSERT statement | ||
INSERT DIRECTORY | Write query results to a file | ||
LEAVE | Exit a loop | ||
OPEN | Open a cursor | ||
LOOP | Unconditional loop | ||
MAP OBJECT | Map object name to a connection profile | ||
NULL | No operation (no-op) statement | ||
Print a line | |||
RESIGNAL | Resignal the exception | ||
RETURN | Return from a routine | ||
SELECT | SELECT statement | ||
SELECT INTO | Assign values from a query | ||
SIGNAL | Raise a condition or exception | ||
SUMMARY | Summary stats for a table or result set | ||
SET | Assign a value to a variable | ||
SET Session Option | Set a session option | ||
TRUNCATE | Truncate a table | ||
UPDATE | UPDATE statement | ||
USE | Change the default database | ||
VALUES INTO | Assign a value to a variable | ||
WHILE | While loop |
这个是存储过程中的语法,以前看过oracle的存储过程感觉挺像的。
CREATE PROCEDURE set_message(IN arg STRING)
BEGIN
SET result = 'Hello, lxw的大数据田地!';
print arg;
PRINT result || ', ' || arg;
END;
PRINT 'Hello, lxw1234.com! at beginning .. ';
CALL set_message(' by CALL ..');
借用文章开头那位大神的例子说明下(大神觉得简单,基本没介绍)
首先CREATE PROCEDURE 声明了一个存储过程,其实类似与java里的方法一样,调用的方式是call procedure,这个就相当于main方法。(中文可能有乱码,无所谓)
第一种 hplsql -f xxx.sql
该方法会执行print 和call set_message(' by CALL ..');
输出Hello, lxw1234.com! at beginning ..
by CALL ..
Hello, lxw的大数据田地! , by CALL ..
第二种 hplsql -f xxx.sql -main set_message
直接调用存储过程 call set_message(null)
输出 null
Hello, lxw的大数据田地! ,
这里调用的时候是没有参数的,这个时候我就在想肯定不行吧,一般作为脚本肯定需要时间传递
那么肯定有解决办法:内置的时间参数类似与${YYYY-mm-dd} 或者可以传入时间参数类似 call set_message(2020-04-13)
1、使用-d参数传递
./hplsql -f cc_test/hplsql_test -d yyyy=2020
脚本里的print yyyy 就会在屏幕上输出 2020
2、内置时间参数暂时还没找到。。。。。。。。。。。。。。
一些简单的使用demo
hplsql的使用demo
查询数据库里的内容
[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat select.sql [devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f select.sql Starting query |
---|
使用简单的函数
[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat function_hello.sql [devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f function_hello.sql Hello, world! |
---|
创建存储过程
[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat proc_print.sql [devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f proc_print.sql Hello, chenchi! at beginning .. [devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f proc_print.sql -main set_message null |
---|
使用include引用其他文件里的存储过程
[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat set_message.sql [devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat include.sql [devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f include.sql name=world |
---|
使用package
[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat package_users.sql create or replace package body users as [devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f package_users.sql Number of users: 3 |
---|
使用游标
[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat cursor.sql DECLARE id INT; [devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f cursor.sql - col2_2 |
---|
更多demo自己看http://www.hplsql.org/doc