hive之hplsql学习

最新推荐文章于 2024-04-29 15:17:58 发布

cclovezbf

最新推荐文章于 2024-04-29 15:17:58 发布

阅读量3.2k

点赞数 3

分类专栏： hive 文章标签： hplsql

本文链接：https://blog.csdn.net/cclovezbf/article/details/105414903

版权

hive 专栏收录该内容

49 篇文章 12 订阅

订阅专栏

最近需要搞这方面的工作，首先百度了一下，发现重复率太高，于是自己去官网查看了，附上官网地址

http://www.hplsql.org/doc

这个人应该最开始发表文章的，写的不错

http://lxw1234.com/archives/category/hive

这段话翻译一下注意时间是2016年2月就有了

1、hive2.0.0 实现了hpl/sql功能，实际上就是类似于mysql、oracle的存储过程

2、hpl/sql很牛b

一、commandline

hplsql -e 'query' | -f file 
      [-main procname]
      [-d | --define | -hiveconf | -hivevar var=value ...] 
      [-version | --version]
      [-trace | --trace]
      [-H | --help]

Parameters:

Parameter		Description	HPL/SQL Version
-e 'query'		SQL statements to execute	0.1
-f file		Execute SQL statements from file	0.1
-main procname		Entry point (procedure or function name)	0.3.7
-d var=value		Variable definition	0.1
--define var=value			0.1
-hiveconf var=value			0.1
-hivevar var=value			0.1
-version	--version	Print HPL/SQL version	0.1
-trace	--trace	Print trace information	0.1
-H	--help	Print help information and exit	0.1

进入hive的bin安装目录就可以看到hplsql这个脚本，版本是cdh6.3.0，HPL/SQL 0.3.31

Notes:

-e and -f cannot be specified together
if -main option is not specified, HPL/SQL starts executing all statements from the beginning of the script
You can use single 'query' and double “query” quotes with -e option
Currently -d, --define, -hivevar and -hiveconf are equivalent and allow you to define input variables.

翻译下

1、-e 和-f 不能同时使用

2、如果-main选项没有指定，hplsql也就是你写的存储过程将执行所有的内容

3、-e的时候，你可以使用单引号和双引号

4、-d, --define, -hivevar 和-hiveconf都可以用来传递参数

Example 1:

Executing HPL/SQL statements from a script:

hplsql -f script.sql

Example 2:

Executing HPL/SQL statements from command line:

hplsql -e "NVL(MAX_PARTITION_DATE(db.sales, local_dt, code='A'), CURRENT_DATE)"

这句话暂时还不太理解，这个函数目前没看到，而且里面的参数也不明白，这个官方示例还不太懂

Example 3:

Using variables:

hplsql -e "PRINT a || ', ' || b" -d a=Hello -d b=world

Result:

Hello, world

就目前来说./hplsql 和./hive的功能基本一样

——————————————————————————————————————————————————————

下面主要学习存储过程基础、写法和使用

Operators

操作符

Expression	Result	Result Type
3 + 1	4	Integer
'Ab' + 'c'	'Abc'	String
DATE '2014-12-31' + 1	DATE '2015-01-01'	DATE
3 = 3	True	Boolean
3 = 1	False	Boolean
'CA' = NULL	NULL	Boolean
3 == 3	True	Boolean
3 <> 3	False	Boolean
3 != 3	False	Boolean
3 > 1	True	Boolean
3 < 1	False	Boolean
3 >= 1	True	Boolean
3 <= 1	False	Boolean
'a' \|\| 'b' \|\| 'c'	'abc'	String
'a' \|\| 1 \|\| 'c'	'a1c'	String
'a' \|\| NULL \|\| 'c'	'ac'	String
NULL \|\| NULL	NULL	String

许多示例使用

./hplsql -e "PRINT a || ', ' || b" --hivevar a=Hello --hivevar b=lxw1234.com

也没人说使用原因，hplsql中 a b是字符串，然后用||连接符连接 ', '注意这个,是字符串，所以一定要引起来

使用位置：

比如我现在用hplsql写个脚本，每天把a表的数据插入到b表

vim test.sh

hive -e 'insert into tableA as select * from impala.'||tableB||' limit 10';

—————————————————————————————————————————————————————————

Attributes and Built-in Variables

HPL/SQL attributes and built-in variables:

ACTIVITY_COUNT	Number of rows affected by last SQL statement //是否有数据，有为1没有为0
ERRORCODE	Return code of the last SQL statement //上句sql写的对不对 0正确，正数warning 负数error
HOSTCODE	Return code of the last OS command //
SQLCODE	Return code of the last SQL statement //0正确，正数warning 负数error
SQLSTATE	Return status of the last SQL statement //算了感觉都差不多

这个的具体作用主要就是判断上一句sql 的查询插入删除等情况是否正常运行，从而来输出日志，同时判断下一步走向

————————————————————————————————————————————————————————

Statements

HPL/SQL statements:

ALLOCATE CURSOR			Allocate cursor for procedure result set
ASSOCIATE RESULT SET LOCATOR			Define locators for procedure result sets
BREAK			Exit a loop
CALL			Execute a stored procedure
CLOSE			Close a cursor
CMP			Compare data in tables
COPY			Copy data between tables and files
COPY FROM FTP			Copy FTP files to Hadoop compatible file system
COPY FROM LOCAL			Copy local files to Hadoop compatible file system
CREATE DATABASE			Create a database
CREATE FUNCTION			Create a user-defined SQL function
CREATE LOCAL TEMPORARY TABLE			Create a session-level temporary table
CREATE PACKAGE			Create a program package
CREATE PROCEDURE			Create a user-defined SQL procedure
CREATE TABLE			Create a table
CREATE VOLATILE TABLE			Create a session-level temporary table
DECLARE			Declare a variable
DECLARE CONDITION			Declare a condition
DECLARE CURSOR			Declare a cursor
DECLARE HANDLER			Declare a condition handler
DECLARE TEMPORARY TABLE			Declare a temporary table
DESCRIBE			Describe a database object
DROP DATABASE			Drop a database
DROP TABLE			Drop a table
EXEC	EXECUTE	EXECUTE IMMEDIATE	Execute a dynamic SQL statement or procedure
EXIT WHEN			Exit a loop
FETCH			Fetch the next row from a cursor
FOR cursor			FOR statement (Cursor loop)
FOR range			FOR statement (Integer range)
GET DIAGNOSTICS			Get execution information
HOST			Execute an OS command or run an external process
IF			IF statement
INCLUDE			Include statements from another script
INSERT			INSERT statement
INSERT DIRECTORY			Write query results to a file
LEAVE			Exit a loop
OPEN			Open a cursor
LOOP			Unconditional loop
MAP OBJECT			Map object name to a connection profile
NULL			No operation (no-op) statement
PRINT			Print a line
RESIGNAL			Resignal the exception
RETURN			Return from a routine
SELECT			SELECT statement
SELECT INTO			Assign values from a query
SIGNAL			Raise a condition or exception
SUMMARY			Summary stats for a table or result set
SET			Assign a value to a variable
SET Session Option			Set a session option
TRUNCATE			Truncate a table
UPDATE			UPDATE statement
USE			Change the default database
VALUES INTO			Assign a value to a variable
WHILE			While loop

这个是存储过程中的语法，以前看过oracle的存储过程感觉挺像的。

CREATE PROCEDURE set_message(IN arg STRING)
BEGIN
 SET result = 'Hello, lxw的大数据田地!';
 print arg;
 PRINT result || ', ' || arg;
END;
 
PRINT 'Hello, lxw1234.com! at beginning .. ';
CALL set_message(' by CALL ..');

借用文章开头那位大神的例子说明下（大神觉得简单，基本没介绍）

首先CREATE PROCEDURE 声明了一个存储过程，其实类似与java里的方法一样，调用的方式是call procedure，这个就相当于main方法。(中文可能有乱码，无所谓)

第一种 hplsql -f xxx.sql

该方法会执行print 和call set_message(' by CALL ..');

输出Hello, lxw1234.com! at beginning ..

by CALL ..

Hello, lxw的大数据田地! , by CALL ..

第二种 hplsql -f xxx.sql -main set_message

直接调用存储过程 call set_message(null)

输出 null

Hello, lxw的大数据田地! ,

这里调用的时候是没有参数的，这个时候我就在想肯定不行吧，一般作为脚本肯定需要时间传递

那么肯定有解决办法：内置的时间参数类似与${YYYY-mm-dd} 或者可以传入时间参数类似 call set_message(2020-04-13)

1、使用-d参数传递

./hplsql -f cc_test/hplsql_test -d yyyy=2020

脚本里的print yyyy 就会在屏幕上输出 2020

2、内置时间参数暂时还没找到。。。。。。。。。。。。。。

一些简单的使用demo

hplsql的使用demo

查询数据库里的内容

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat select.sql
select * from impala.impala_test_middle limit 3 --hive的库名和表名

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f select.sql

Starting query
Query executed successfully (893 ms)
col1_2 col2_2 col3_2 col4_2 col5_2 col6_2 col7_2 col8_2 col9_2 part10
col1_2 col2_2 col3_2 col4_2 col5_2 col6_2 col7_2 col8_2 col9_2 part1
col1_4 col2_4 col3_4 col4_4 col5_4 col6_4 col7_4 col8_4 col9_4 part10

使用简单的函数

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat function_hello.sql
CREATE FUNCTION hello2(text STRING)
RETURNS STRING
BEGIN
RETURN 'Hello, ' || text || '!';
END;
-- Call the function
PRINT hello2('world');

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f function_hello.sql

Hello, world!

创建存储过程

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat proc_print.sql
CREATE PROCEDURE set_message(IN arg STRING)
BEGIN
SET result = 'Hello chenchi!';
print arg;
PRINT result || ', ' || arg;
END;

PRINT 'Hello, chenchi! at beginning .. ';
CALL set_message(' by CALL ..');

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f proc_print.sql

Hello, chenchi! at beginning ..
by CALL ..
Hello chenchi!, by CALL ..

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f proc_print.sql -main set_message

null
Hello chenchi!,

使用include引用其他文件里的存储过程

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat set_message.sql
CREATE PROCEDURE set_message(IN name STRING, OUT result STRING)
BEGIN
print 'name='||name
SET result = 'Hello, ' || name || '!';
END;

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat include.sql
INCLUDE set_message.sql
DECLARE str STRING;
CALL set_message('world', str);
PRINT str;

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f include.sql

name=world
Hello, world!

使用package

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat package_users.sql
create or replace package users as
session_count int := 0;
function get_count() return int;
procedure add(name varchar(100));
end;

create or replace package body users as
function get_count() return int
is
begin
return session_count;
end;
procedure add(name varchar(100))
is
begin
-- ...
session_count = session_count + 1;end;
end;
users.add('John');
users.add('Sarah');
users.add('Paul');
print 'Number of users: ' || users.get_count();

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f package_users.sql

Number of users: 3

使用游标

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat cursor.sql
CREATE PROCEDURE spOpenIssues
DYNAMIC RESULT SETS 1
BEGIN
DECLARE cur CURSOR WITH RETURN FOR
SELECT col1,col2 FROM impala.impala_test_middle;
OPEN cur;
END;

DECLARE id INT;
DECLARE name VARCHAR(30);

CALL spOpenIssues;
ALLOCATE c1 CURSOR FOR PROCEDURE spOpenIssues;

FETCH c1 INTO id, name;
WHILE (SQLCODE = 0)
DO
PRINT id || ' - ' || name;
FETCH c1 INTO id, name;
END WHILE;
CLOSE c1;

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f cursor.sql

- col2_2
- col2_4
- col2_4

更多demo自己看http://www.hplsql.org/doc