Hive Shell常用操作

最新推荐文章于 2024-03-26 14:37:37 发布

首席撩妹指导官

最新推荐文章于 2024-03-26 14:37:37 发布

阅读量1.3w

点赞数 3

分类专栏：大数据文章标签： hive

本文链接：https://blog.csdn.net/qq_36864672/article/details/78624060

版权

大数据专栏收录该内容

160 篇文章 3 订阅

订阅专栏

Hive的hql命令执行方式有三种：

1、CLI 方式直接执行
2、作为字符串通过shell调用hive –e执行（-S开启静默，去掉”OK”，”Time taken”）
3、作为独立文件，通过shell调用 hive –f或hive –i执行执行

 
 1
2
3
4

方式1

键入“hive”，启动hive的cli交互模式。Set可以查看所有环境设置参数，并可以重设。其他命令如，
    Use database        选择库
    quit/exit   退出Hive的交互模式 
    set –v  显示Hive中的所有变量
    set <key>=<value>       设置参数
    执行本地shell ：!<cmd>       交互模式下可执行shell命令，例如（查看linux根目录下文件列表："!ls -l /;"）
    操作云命令：dfs < command>        交互模式下直接操作hadoop命令如 dfs fs –ls
    Hql语句       执行查询并输出到标准输出
    add [FILE|JAR|ARCHIVE] <value> [<value>]*       增加一个文件到资源列表
    list FILE       列出所有已经添加的资源

 
 1
2
3
4
5
6
7
8
9
10
11

方式二

Hql作为字符串在shell脚本中执行，如
    hive -e "use ${database};select * from tb"
查询结果可以直接导出到本地本件（默认分隔符为\t）:
    hive -e "select * from tb" > tb.txt

 
 1
2
3
4
5

如果需要查看执行步骤，则在命令前面添加

    set –x

 
 1
2

另外，在shell脚本中，字符串有两种定义方式:

1)  直接定义字符串对象：sql=”字符串”

2)  通过命令定义：sql=$(cat <<endtag 字符串endtag)方式可以将字符串复制给sql，执行hql命令的shell脚本如下：

 
 1
2
3
4

####### execute hive ######
sql=$(cat <<!EOF

USE pmp;
set mapred.queue.names=queue3;

drop table if exists people_targeted_delivery;
create table people_targeted_delivery
( special_tag_id int,
  cnt bigint
);

INSERT OVERWRITE LOCAL DIRECTORY '$cur_path/people_targeted_delivery'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
select special_tag_id,count(1) 
from t_pmp_special_user_tags
group by special_tag_id;

!EOF)
############  execute begin   ###########
echo $sql
$HIVE_HOME/bin/hive -e "$sql"

exitCode=$?
if [ $exitCode -ne 0 ];then
         echo "[ERROR] hive execute failed!"
         exit $exitCode
fi
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

方式三

将hql语句保存为独立文件，后缀名不限制，可以用.q或者.hql作为标识：
    A，这个文件在cli模式下，用source命令执行，如：source ./mytest.hql
    B，在shell中执行命令，如：hive -f  mytest.sql

 
 1
2
3
4

Hive指定预执行文件命令“hive –i”（或叫初始化文件）

命令：hive -i hive-script.sql
在hive启动cli之前，先执行指定文件（hive-script.sql）中的命令。
也就是说，允许用户在cli启动时预先执行一个指定文件，比如，有一些常用的环境参数设置，频繁执行的命令，可以添加在初始化文件中，比如，
    某些参数设置
        set mapred.queue.names=queue3;
        SET mapred.reduce.tasks=14;
    添加udf文件
        add JAR ./playdata-hive-udf.jar;
    设置Hive的日志级别 
        hive -hiveconf hive.root.logger=INFO;


 
 
  
  #hive     启动
 
 
 
 
  
  hive>quit;     --退出hive
 
 
 
 
  
  
  
  
   
   hive> exit;    --exit会影响之前的使用，所以需要下一句kill掉hadoop的进程
  
  
  
  
   
   >hadoop job -kill jobid
  
  
  
  

 
 
 
 
  
  hive>create database database_name; 创建数据库
 
 
 
 
  
  
   
   如果数据库已经存在就会抛出一个错误信息，使用如下语句可以避免抛出错误信息：
  
  
  
  
   
   hive>creat database if not exists database_name
  
  
  
  
  
  hive> show databases;   查看数据库
如果数据库比较多的话，也可以用正则表达式来查看：
hive> show databases like 'h.*';

hive> use default;    --使用哪个数据库
hive>show tables;  --查看该数据库中的所有表
hive>show tables  ‘*t*’;    --支持模糊查询
hive> describe tab_name;    --查看表的结构及表的路径
hive> describe database database_name;  --查看数据库的描述及路径
可以用下面的命令来修改数据库的路径：
hive> creat database database_name location '路径';   
hive> drop database if exists database_name; --删除空的数据库
hive> drop database if exists database_name cascade; --先删除数据库中的表再删除数据库

hive>show partitions t1;   --查看表有哪些分区 
  
  
修改表：
hive>alter table table_name rename to another_name;   --修改表名

hive>drop table t1 ;      --删除表t1
  
  或者： hive> drop table if exists t1;
  
  

  
  hive不支持修改表中数据，但是可以修改表结构，而不影响数据
  
  有local的速度明显比没有local慢：
hive>load data inpath '/root/inner_table.dat' into table t1;   移动hdfs中数据到t1表中
hive>load  data local inpath '/root/inner_table.dat' into table t1;  上传本地数据到hdfs中
hive> !ls;  查询当前linux文件夹下的文件
hive> dfs -ls /; 查询当前hdfs文件系统下  '/'目录下的文件
  
  
   
   

  
  
  
  
   
   

  
  
  
  hive笔记：
  
  
   
   
   
   
    
    1、从文件中执行hive查询：$ hive -f .sql文件的路径;
   
   
   
   
    
         e.g  $hive -f /path/to/file/xxxx.hql;
   
   
   
   
    
         在hive shell中可以用source命令来执行一个脚本文件: hive>source .sql文件的路径
   
   
   
   
    
          e.g. hive> source /path/to/file/test.sql;
   
   
   
   
    
          hive中一次使用命令： $ hive -e "SQL语句"；
   
   
   
   
    
         e.g.  $ hive -e "select * from mytable limit 3";
   
   
   
   
    
    

   
   
   
   
    
    2、没有一个命令可以让用户查看当前所在的是哪个数据库库
   
   
   
   
    
    

   
   
   
   
    
    3、在hive内执行一些bash shell命令(在命令前加！并且以；结尾即可)
   
   
   
   
    
         
    
       
    
    
   
   
   
   
    
    4、在hive内执行Hadoop的dfs命令：（去掉hadoop,以；结尾）
   
   
   
   
    
    

   
   
   
   
    
    5、 Hive脚本如何注释：
   
   
   
   
    
          使用--开头的字符串来表示注释
   
   
   
   
    
    

   
   
   
   
    
    6、Hive与MySQL相比，它不支持行级插入操作、更新操作和删除操作。Hive也不支持事务。
   
   
   
   
    
         Hive增加了在Hadoop背景下的可以提高更高性能的扩展。
   
   
   
   
    
    

   
   
   
   
    
    7、
    
    向管理表中加载数据：
   
   
   
   
    
          Hive没有行级别的插入、删除、更新的操作，那么往表里面装数据的唯一的途径就是使用一种“大量”的数据装载操作，或者仅仅将文件写入到正确的目录下面。
   
   
   
   
    
          
    
    overwrite关键字：
   
   
   
   
    
                  load data local inpath '${env:HOME}/目录'
   
   
   
   
    
                  overwrite into table table_name
   
   
   
   
    
                  partition (分区)；
   
   
   
   
    
    

   
   
   
   
    
    8、从表中导出数据：
   
   
   
   
    
        hadoop fs -cp source_path target_path
   
   
   
   
    
    或者：用户可以使用 insert……directory……
   
   
   
   
    
         insert overwrite local directory '/tmp/目录'     这里指定的路径也可以是全URL路径
   
   
   
   
    
    

   
   
   
   
    
    9、hive中使用正则表达式
   
   
   
   
    
      （1）   hive> select 'price.*' from table_name;
   
   
   
   
    
                   选出所有列名以price作为前缀的列
   
   
   
   
    
      （2）  用Like或者RLike
   
   
   
   
    
    

   
   
   
   
    
    10、聚合函数
   
   
   
   
    
         可以通过设置属性hive.map.aggr值为true来提高聚合的性能：
   
   
   
   
    
         hive>hive.map.aggr=true;
   
   
   
   
    
    

   
   
   
   
    
    11、什么情况下hive可以避免进行mapreduce?
   
   
   
   
    
          在本地模式的时候可以避免触发一个mr的job，此外，如果属性hive.execmode.local.auto的值为true的话，hive还户尝试本地模式进行其他的操作。
   
   
   
   
    
           set hive.execmode.local.auto=true;
   
   
   
   
    
           说明：最好将 set hive.execmode.local.auto=true;这个设置增加到你的$HOME/.hiverc配置文件中去。
   
   
   
   
    
    

   
   
   
   
    
    12、JOIN语句
   
   
   
   
    
           hive支持通常的SQL JOIN语句，但是只支持等值连接。hive也不支持在on子句中用谓词OR
   
   
   
   
    
    

   
   
   
   
    
    13、union all
   
   
   
   
    
           将两个表或者多个表进行合并，每一个union all子查询都必须具有相同的列，而且对应每个字段的每个类型都必须一致。

1.Hive非交互模式常用命令：

　　1) hive -e：从命令行执行指定的HQL，不需要分号：

% hive -e 'select * from dummy' > a.txt

　　2) hive –f：执行HQL脚本

% hive -f /home/my/hive-script.sql  --hive-script.sql是hql脚本文件

　　3) hive -i：进入Hive交互Shell时候先执行脚本中的HQL语句

% hive -i /home/my/hive-init.sql

　　4) hive -v：冗余verbose模式，额外打印出执行的HQL语句

　　5) hive -S：静默Slient模式，不显示转化MR-Job的信息，只显示最终结果

% hive -S -e ‘select * from student’

　　6)hive --hiveconf <property=value>：使用给定属性的值

$HIVE_HOME/bin/hive --hiveconf mapred.reduce.tasks=2 //启动时,配置reduce个数2（只在此session中有效）

　　7)hive --service serviceName：启动服务

　　8)hive [--database test]：进入CLI交互界面，默认进入default数据库。加上[]内容直接进入test数据库。

%hive --database test

3.Hive的交互模式下命令：

quit / exit：退出CLI

reset：重置所有的配置参数，初始化为hive-site.xml中的配置。如之前使用set命令设置了reduce数量。

set <key>=<value>：设置Hive运行时配置参数，优先级最高，相同key，后面的设置会覆盖前面的设置。

set –v：打印出所有Hive的配置参数和Hadoop的配置参数。

//找出和"mapred.reduce.tasks"相关的设置
hive -e 'set -v;' | grep mapred.reduce.tasks

add命令：包括add File[S]/Jar[S]/Archive[S] <filepath> *，向 DistributeCache 中添加一个或过个文件、jar包、或者归档，添加之后，可以在Map和Reduce task中使用。比如，自定义一个udf函数，打成jar包，在创建函数之前，必须使用add jar命令，将该jar包添加，否则会报错找不到类。

list 命令：包括list File[S]/Jar[S]/Archive[S]。列出当前DistributeCache中的文件、jar包或者归档。

delete 命令：包括 delete File[S]/Jar[S]/Archive[S] <filepath>*。从DistributeCache中删除文件。

//将file加入缓冲区
add file /root/test/sql;

//列出当前缓冲区内的文件
list file; //删除缓存区内的指定file delete file /root/test/sql;

create命令：创建自定义函数：hive> create temporary function udfTest as ‘com.cstore.udfExample’;

source <filepath>：在CLI中执行脚本文件。

//相当于[root@ncst test]# hive -S -f /root/test/sql
hive> source /root/test/sql;

! <command>：在CLI执行Linux命令。

dfs <dfs command>：在CLI执行hdfs命令

4.保存查询结果の三种方式：

% hive -S -e 'select * from dummy' > a.txt //分隔符和hive数据文件的分隔符相同

[root@hadoop01 ~]# hive -S -e "insert overwrite local directory '/root/hive/a'\ 
>  row format delimited fields terminated by '\t' --分隔符\t
>  select * from logs sort by te"

--使用hdfs命令导出整个表数据
hdfs dfs -get /hive/warehouse/hive01 /root/test/hive01

5.Hive集群间的导入和导出

使用Export命令会导出Hive表的数据表数据以及数据表对应的元数据

--导出命令
EXPORT TABLE test TO '/hive/test_export'

--dfs命令查看
hdfs dfs -ls /hive/test_export

--结果显示
/hive/test_export/_metadata
/hive/test_export/data

使用Import命令将导出的数据重新导入到hive中(必须将现导入的表重命名)

--导入到内部表的命令
IMPORT TABLE data_managed FROM '/hive/test_export'

--导入到外部表的命令
Import External Table data_external From '/hive/test_export' Location '/hive/external/data'

--验证是否是外部表
desc formatted data_external

6.Hive - JDBC/ODBC

在Hive的jar包中，"org.apache.hadoop.hive.jdbc.HiveDriver"负责提供 JDBC 接口，客户端程序有了这个包，就可以把 Hive 当成一个数据库来使用，大部分的操作与对传统数据库的操作相同，Hive 允许支持 JDBC 协议的应用程序连接到 Hive。当 Hive 在指定端口启动 hiveserver 服务后，客户端通过 Java 的 Thrift 和 Hive 服务器进行通信。过程如下：

　　1.开启 hiveserver 服务：$ hive –service hiveserver 50000（50000）

　　2.建立与 Hive 的连接：Class.forName(“org.apache.hadoop.hive.jdbc.HiveDriver”);

　　　　 Connection con= DriverManager.getConnection(“jdbc:hive://ip:50000/default,”hive”,”hadoop”)

　　　　默认只能连接到 default 数据库，通过上面的两行代码建立连接后，其他的操作与传统数据库无太大差别。

　　3.Hive 的 JDBC 驱动目前还不太成熟，并不支持所有的 JDBC API。

7.Hive Web Interface

　　1.配置hive-site.xml

        <property>
        <name>hive.hwi.war.file</name>
        <value>lib/hive-hwi-0.8.1.war</value>
        <description>This sets the path to the HWI war file, relative to ${HIVE_HOME}.</description>    
        </property>
        
        <property>
        <name>hive.hwi.listen.host</name>
        <value>0.0.0.0</value>
        <description>This is the host address the Hive Web Interface will listen on</description>
        </property>
        
        <property>
        <name>hive.hwi.listen.port</name>
        <value>9999</value>
        <description>This is the port the Hive Web Interface will listen on</description>
        </property>

　　2.启动Hive的Web服务：hive --service hwi

　　3.在浏览器键入地址：http://host_name:9999/hwi访问

　　4.点击“Create Session”创建会话，在Query中键入查询语句

8. Hive创建数据库

hive启动后默认有一个Default数据库，也可以人为的新建数据库，命令：

--手动指定存储位置
create database hive02 location '/hive/hive02';

--添加其他信息(创建时间及数据库备注)
create database hive03 comment 'it is my first database' with dbproperties('creator'='kafka','date'='2015-08-08');

--查看数据库的详细信息
describe database hive03;
--更详细的查看
describe database extended hive03; 
--最优的查看数据库结构的命令
describe database formatted hive03;

--database只能修改dbproperties里面内容
alter database hive03 set dbproperties('edited-by'='hanmeimei');

首席撩妹指导官

关注

3
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
Hive Shell常用操作

Hive的hql命令执行方式有三种：1、CLI 方式直接执行2、作为字符串通过shell调用hive –e执行（-S开启静默，去掉”OK”，”Time taken”）3、作为独立文件，通过shell调用 hive –f或hive –i执行执行1234方式1键入“hive”，启动hive的cli交互模式。Set可以查看所有环境设置参数，并可以重设。其他命令如， Use
复制链接

扫一扫