hive 基本操作

最新推荐文章于 2022-10-14 16:12:05 发布

imxintian

最新推荐文章于 2022-10-14 16:12:05 发布

阅读量2.1k

点赞数 3

分类专栏： big-data hive

本文链接：https://blog.csdn.net/tian_xin_/article/details/79752942

版权

big-data 同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

hive

2 篇文章 0 订阅

订阅专栏

Hive

hive 基本操作

建库建表
sql create database mydb;创建数据库 use mydb；建表 create table stu( id int, name string, sex string, address string, poneNum string )row format delimited fields terminated by '\t'; --如果不指定分隔符，那么就是默认的hive分隔符'\001' --指定分隔符需要按照数据的分隔符设置
like关键词 –复制原表的表结构
create table emp_like like emp;
as 关键词 –抽取表格中一部分的数据
create table dept_like as +sql语句
```
栗子：create table dept_like as  select * from dept limit 2;
```
导入数据
1. 加载本地文件到hive表中
  
  load data local inpath '路径' into table 表名;
2. 加载HDFS上的文件到表中
  
  load data inpath 'HDFS路径' into table 表名;
3. 加载数据覆盖表中已有的数据
```
load  data  local  inpath '路径' overwrite into table 表名;
load  data   inpath 'HDFS路径' overwrite into table 表名;
```
4. 创建表的时候通过select语句加载数据
  
  create table dept_as as sql语句;
5. 通过insert加载
```
insert [overwrite] into table 表名  sql语句;
eg:计算emp，每个部门的工资总数
sql语句：
select deptno,sum(sal) from emp ;
```
6. 通过location的方式指定加载数据
  sql create table emp_location( deptno int, dname string, loc string )row format delimited fields terminated by '\t' location '/user/hive/warehouse/mydb.db/emp'; 注意使用location的时候，只需要指定路径到最终的目录！！
导出数据：
1. –导出到linux本地的文件系统
  
  insert [overwrite] local directory 'linux路径' [指定分隔符] + hql语句
2. –导出到hdfs的文件系统
  
  insert [overwrite] directory 'HDFS路径' [指定分隔符] + hql语句
3. –导出到本地文件系统
```
  bin/hive -e 'hql语句' >(>>) 'linux路径'; 
    > :覆盖  >>：追加
```
4. sqoop框架
  
  –把hive中的数据导出到数据库（mysql）

插入数据


    load data local inpath '/opt/hivedata/stu.txt' into table stu;
    执行多次代表了追加数据

    --》加local，代表加载本地数据到hive表中（linux的路径）
    --》不加local，代表加载是hdfs文件系统的路径

覆盖以前的数据
sql load data local inpath '/opt/hivedata/stu.txt' overwrite into table stu;
表中的数据是储存在hdfs上的
***创建数据库的时候，相应的在user/hive/warehouse下面会产生一个同名的目录 ***创建表的时候，会在相应的数据库下创建一个同名的子目录 ***加载数据的时候，会在对应表目录下产生一个同名文件
表的类型

查看表的详细信息
    desc 表名；
    desc formatted 表名；

创建外部表：
create EXTERNAL table emp_ext(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)row format delimited fields terminated by '\t';
加载数据
load data  local inpath '/opt/hivedata/dept.txt' into table emp_ext;

管理表和外部表的区别

管理表删除之后，元数据和数据全部删除
外部表删除之后，元数据删除，数据仍然存在

一般都使用外部表，为了保证数据的安全性。

hive运行的重要参数

-e + ‘sql语句’

栗子：bin/hive -e 'select * from mydb.stu' 增删改查

-f + hql文件

        在hive目录下，创建一个目录test，在test中创建文件testF.hql
        把select * from mydb.stu写进testF.hql
        栗子：bin/hive -f test/testF.hql 
        情况：sql语句比较复杂，需要周期性执行的

-S 静默模式

栗子：bin/hive -S -f test/testF.hql

-i 返回数据之后，直接进入客户端初始化的操作

栗子：bin/hive -i  test/testF.hql 
上面的栗子都是在hive目录下操作的

source + hql文件
```
栗子：source test/testF.hql;
```

hive 分区

分区表

用来分析用户的行为日志，在访问网站的时候，点击网站的时候就会产生一条日志

20171014
    20171015
    20171016  按照每个小时的
        2017101610.log
        2017101611.log
        2017101612.log
        2017101613.log
            2017101613...

分区表语法

静态分区(一级分区)
- ```
create table emp_part(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)partitioned by(date string)
row format delimited fields terminated by '\t';
```
  - 插入数据：
    sql 语法： load data [local] inpath '路径' [overwrite] into table 表名 partititon (date='20171016') load data local inpath '/opt/hivedata/emp.txt' into table emp_part partition (date='20171016');
  - 注意：
    - 建立分区表的时候，只需要在一般性建表的后面加上PARTITIONED BY (PARTITION_CLOUMN string………)
    - 在导入数据的时候（插入数据），要注意一定要指定分区范围，否则就会报错
    - 在储存的时候，在相应的HDFS上面，在表的目录下面，相对应的创建多级目录

动态分区(二级分区)

sql create table emp_part2( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int )partitioned by(date string,hour string) row format delimited fields terminated by '\t';
插入数据：
sql 语法： load data [local] inpath '路径' [overwrite] into table 表名 partititon (date='20171016',hour='18') load data local inpath '/opt/hivedata/emp.txt' into table emp_part2 partition (date='20171016',hour='18'); load data local inpath '/opt/hivedata/emp.txt' into table emp_part2 partition (date='20171016',hour='19'); load data local inpath '/opt/hivedata/emp.txt' into table emp_part2 partition (date='20171016',hour='20');
查询数据：
sql select ename from emp_part where date='20171016'; select ename from emp_part2 where date='20171016' and hour='19'; 分表的查询，只需要在语句后面加上分区条件

分区表的示例(二级分区)

表：track_log
数据：2015082818，2015082819
create table track_log(
id                                 string,
url                                string,
referer                            string,
keyword                            string,
type                               string,
guid                               string,
pageId                             string,
moduleId                           string,
linkId                             string,
attachedInfo                       string,
sessionId                          string,
trackerU                           string,
trackerType                        string,
ip                                 string,
trackerSrc                         string,
cookie                             string,
orderCode                          string,
trackTime                          string,
endUserId                          string,
firstLink                          string,
sessionViewNo                      string,
productId                          string,
curMerchantId                      string,
provinceId                         string,
cityId                             string,
fee                                string,
edmActivity                        string,
edmEmail                           string,
edmJobId                           string,
ieVersion                          string,
platform                           string,
internalKeyword                    string,
resultSum                          string,
currentPage                        string,
linkPosition                       string,
buttonPosition                     string
)PARTITIONED BY (date string,hour string)
row format delimited fields terminated by '\t';

加载数据：

        load data local inpath '/opt/hivedata/2015082818' into table track_log partition (date='20150828',hour='18');
        load data local inpath '/opt/hivedata/2015082819' into table track_log partition (date='20150828',hour='19');

注意：

动态分区：
    动态分区前需要设置
    set hive.exec.dynamic.partition=true;  
    set hive.exec.dynamic.partition.mode=nonstrict;

LOL.LOG         2017101418.LOG
        2017101419.LOG

下载数据到本地
sql insert overwrite local directory '/opt/hivedata/LOL.log' row format delimited fields terminated by '\t' select * from track_log;

创建分区表

create table LOL_log like track_log;

insert  overwrite table LoL_log partition (date,hour) select * from track_log;

insert  overwrite table LoL_log partition (date,hour) select * from track_log where date='20150828' and hour='19';