hive 新客户端 beeline

最新推荐文章于 2024-08-14 21:41:59 发布

继春

最新推荐文章于 2024-08-14 21:41:59 发布

阅读量1.9k

点赞数 1

分类专栏： hive

本文链接：https://blog.csdn.net/u012581453/article/details/89389643

版权

hive 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

今天总结学习一下hive的新客户端hiveserver2

一.beeline简单介绍

提出问题：什么是beeline?为什么人们喜欢用它？

01.首先需要明白beeline 它是一个命令行shell，我们可以将它和Linux的shell工具类比，它之所以会出世，我个人认为是为了更好的和HIVE交互；【因此在一些文档中有明确的标出它用来取来早期的Hive CLI】

02.按照事物自然发展规律，一个事物的出世和消亡都有一定历史时代背景，那么beeline有哪些优点让众多的使用者用它来取代Hive CLI呢？首先简单了解一下beeline 的工作模式，beeline支持嵌入模式【embedded mode】和远程模式【remote mode】。

嵌入模式【embedded mode】：它类似于Hive CLI，就是说Hive CLI能够做到的beeline都可以；

远程模式【remote mode】：它更加安全，也更加节省资源。

从上面的总结可以看出，事物是这样，人也类似，人有你无，你可能就是被淘汰的对象！！

二.如何使用

01.启动hiveserver2

后台不挂断的运行hiveserver2

[hadoop@master ~]$ nohup hiveserver2 1>/home/hadoop/soft/hive-2.3.4/logs/hiveserver.log 2 >/home/hadoop/soft/hive-2.3.4/logs/hiveserver.err &

查看hiveserver2是否运行成功

[hadoop@master ~]$ netstat -lnpt | grep 10000

我们可以发现10000 端口已经被监听，可以确认hiveserver2已经被启动了【这里需要说明一下，有时候启动后会有不成功的情况，这时候需要重新运行启动命令】

02.查看beeline帮助文档

[hadoop@master ~]$ beeline --help

Usage: java org.apache.hive.cli.beeline.BeeLine
   -u <database url>               the JDBC URL to connect to
   -r                              reconnect to last saved connect url (in conjunction with !save)
   -n <username>                   the username to connect as
   -p <password>                   the password to connect as
   -d <driver class>               the driver class to use
   -i <init file>                  script file for initialization
   -e <query>                      query that should be executed
   -f <exec file>                  script file that should be executed
   -w (or) --password-file <password file> the password file to read password from
   --hiveconf property=value       Use value for given property
   --hivevar name=value            hive variable name and value
                                   This is Hive specific settings in which variables
                                   can be set at session level and referenced in Hive
                                   commands or queries.
   --property-file=<property-file> the file to read connection properties (url, driver, user, password) from
   --color=[true/false]            control whether color is used for display
   --showHeader=[true/false]       show column names in query results
   --headerInterval=ROWS;          the interval between which heades are displayed
   --fastConnect=[true/false]      skip building table/column list for tab-completion
   --autoCommit=[true/false]       enable/disable automatic transaction commit
   --verbose=[true/false]          show verbose error messages and debug info
   --showWarnings=[true/false]     display connection warnings
   --showDbInPrompt=[true/false]   display the current database name in the prompt
   --showNestedErrs=[true/false]   display nested errors
   --numberFormat=[pattern]        format numbers using DecimalFormat pattern
   --force=[true/false]            continue running script even after errors
   --maxWidth=MAXWIDTH             the maximum width of the terminal
   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
   --silent=[true/false]           be more silent
   --autosave=[true/false]         automatically save preferences
   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv] format mode for result display
                                   Note that csv, and tsv are deprecated - use csv2, tsv2 instead
   --incremental=[true/false]      Defaults to false. When set to false, the entire result set
                                   is fetched and buffered before being displayed, yielding optimal
                                   display column sizing. When set to true, result rows are displayed
                                   immediately as they are fetched, yielding lower latency and
                                   memory usage at the price of extra display column padding.
                                   Setting --incremental=true is recommended if you encounter an OutOfMemory
                                   on the client side (due to the fetched result set size being large).
                                   Only applicable if --outputformat=table.
   --incrementalBufferRows=NUMROWS the number of rows to buffer when printing rows on stdout,
                                   defaults to 1000; only applicable if --incremental=true
                                   and --outputformat=table
   --truncateTable=[true/false]    truncate table column when it exceeds length
   --delimiterForDSV=DELIMITER     specify the delimiter for delimiter-separated values output format (default: |)
   --isolation=LEVEL               set the transaction isolation level
   --nullemptystring=[true/false] set to true to get historic behavior of printing null as empty string
   --maxHistoryRows=MAXHISTORYROWS The maximum number of rows to store beeline history.
   --help                          display this message

   Example:
    1. Connect using simple authentication to HiveServer2 on localhost:10000
    $ beeline -u jdbc:hive2://localhost:10000 username password

2. Connect using simple authentication to HiveServer2 on hs.local:10000 using -n for username and -p for password
$ beeline -n username -p password -u jdbc:hive2://hs2.local:10012

3. Connect using Kerberos authentication with hive/localhost@mydomain.com as HiveServer2 principal
$ beeline -u "jdbc:hive2://hs2.local:10013/default;principal=hive/localhost@mydomain.com"

4. Connect using SSL connection to HiveServer2 on localhost at 10000
$ beeline "jdbc:hive2://localhost:10000/default;ssl=true;sslTrustStore=/usr/local/truststore;trustStorePassword=mytruststorepassword"

5. Connect using LDAP authentication
$ beeline -u jdbc:hive2://hs2.local:10013/default <ldap-username> <ldap-password>

从上面的参数可以看见beeline常用的可选参数信息和使用的一些例子

beeline命令行可选参数说明
可选参数	功能描述
-u	后面跟的是要连接的jdbc 路径，【注意：这里可以用双引号括起来也可以不用引号括起来！！！】如下实例： [hadoop@master ~]$ beeline -u "jdbc:hive2://master:10000 hive 123456"
-r	用来连接上次连接过得URL【这个必须是已经用来连接过的URL,并且通过!save 命令将连接信息保存在beeline.properties文件中后才可以用】如下实例： 01.先连接并且将连接信息保存在了beeline.properties 文件中 02.退出之后再进行连接
-n	连接HIVE的用户名连接的实例如下：
-p	这个可选参数用来指定连接的密码【需要注意的是：这个参数要是没有指定，直接连接，beeline会提示输入密码】实例如下：
-e	指定查询语句，直接指定要执行的脚本执行；实例如下：
-f	这个可选参数用来指定要执行的脚本；
--isolation	设置hive 的事务隔离级别，这个有关系型数据库经验的都知道事务必须符合ACID原则【原子性-atomicity；一致性--consistency；隔离性-isolation；持久性--durability】，事务隔离级别通常有四种，HIVE 也支持这四种【道理很简单，HIVE是JAVA开发的，在 java.sql.Connection 接口中就定义了这几种】，包括 TRANSACTION_READ_UNCOMMITTED， TRANSACTION_READ_COMMITTED， TRANSACTION_REPEATABLE_READ， TRANSACTION_SERIALIZABLE 这几种级别是逐级递增的，第一种是最低级别，它只保证了不会读取到非法的数据，换而言之就是可能存在脏读【脏读指的是A事务读取到了并行B事务还没有提交的数据】、重复读取【就是在同一个事务里面两次读取的结果不同】、幻读【这个就更好了解了，就是在同一个事务读取到了其它事务未提交的新数据】；第二种比第一种高点，这种事务级别可以防止脏读；第三种防止脏读和重复读取【这是HIVE默认的事务级别】--如下截图打开beeline连接HIVE的时候就可以看到该事务级别！！！第四种是事务的最高级别，防止脏读、重复读取、幻读！！【备注：查看过java.sql.Connection 接口API的可以看到TRANSACTION_NONE 其实就是不支持事务】
--outputformat	这个可选参数用来指定查询结果的显示格式，可选的格式有 table、vertical、csv、tsv、dsv、csv2、tsv2 01、table 这种显示模式是HIVE默认的显示格式显示如下，有点类似于IDE最终查询的表格结果 02、vertical 垂直显示，可以理解为每列都一行行的显示，多行的分隔开设置输出格式为垂直显示下面是查询的结果：就两行数据，两个字段a,b 03、csv 格式，这类似于我们平常用excel软件打开的csv格式，每个字段之间是用逗号分隔开的设置输出格式为csv 查询结果： 04、tsv 就像我们通常显示的那样，没有分隔符设置输出模式为tsv 输出结果：两个字段中间没有分隔符 05、dsv 这种格式的输出就是分隔符为'\|'【分隔符为一个竖线】设置输出格式为dsv 输出结果：竖线分隔 csv2、tsv2 和csv、tsv 类似，此处不再赘述！！
--delimiter	这个可选参数是用来指定查询结果的分隔符的，默认的为分号（;）

声明：本文档仅是自己学习总结，其中有些知识点可能存在错误，若是学友偶然搜到参考，望斟酌后再使用，以免给您带来困扰，若是发现错误也希望您指出更正，在此提前感谢！！总结过程中要是有些地方借鉴了各路大神成果，您觉得侵犯了您的知识产权，对您有所冒犯，烦请通知鄙人，鄙人将会尽快修正！邮箱地址：390835164@qq.com