Hive---Hive语法（二）

快跑呀长颈鹿

已于 2023-02-23 17:06:54 修改

阅读量1.6k

点赞数

分类专栏： # Hive 文章标签： hive hadoop 大数据

于 2023-02-23 15:16:41 首次发布

本文链接：https://blog.csdn.net/weixin_43240150/article/details/129168118

版权

Hive 专栏收录该内容

18 篇文章 2 订阅

订阅专栏

Hive语法（二）

文章目录

Hive语法（二）
Load加载数据
- Load操作
- - 指定local
  - 没有指定local
Insert插入数据
select查询数据
创建分区表
数据表

Load加载数据

默认路径 /opt/soft/hive312/warehouse

在这里插入图片描述

可以使用hdfs dfs -put 上传

Load操作

Load data [local] inpath 'filepath' [overwrite] into table tablename;

指定local

将在本地文件系统中查找文件路径
若指定相对路径，将相对于用户的当前工作目录进行解释
用户也可以为本地文件指定完整的URI-----例如：file://opt/file.txt

没有指定local

如果filepath指向的是一个完整的URI，会直接使用这个URI；
如果没有指定数据库，Hive会使用在hadoop配置文件中参数fs.default.name指定的

本地指的是node1

create table if not exists employee(
    name string,
    workplace array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
row format delimited fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';
# 本地加载(本质是hadoop dfs -put 上传操作)复制
load data local inpath '/opt/stufile/emp.txt' into table employee;
# 从HDFS加载 (本质是hadoop fs -mv 操作)移动
load data  inpath "hdfspath" into table employee;

Insert插入数据

Hive官方推荐加载数据的方式
也可以使用insert语法把数据插入到指定的表中（应为insert操作底层走MapReduce操作，效率很低）
最常用的配合是把查询返回的结果插入到另一张表中（insert+select）。

insert into table table_name select statement from table2_name

注意：查询返回的字段必须和插入表字段一致

select查询数据

SELECT[ALL丨DISTINCT] select—expr,select—expr,....
FROM table_reference
[WHERE where—condition]
[GROUP BY col_list]
[ORDER BY col_list]
[LIMT[offset,]rows];

SELECT currernt_database();----查询当前数据库

创建分区表

create table employee2(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
partitioned by (age int)
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

partitioned by (age int) 含义是：创建分区以age分区

分区表插入数据

0: jdbc:hive2://192.168.95.150:10000> load data local inpath '/opt/employee.txt' into table employee2 partition(age=20);
0: jdbc:hive2://192.168.95.150:10000> load data local inpath '/opt/employee.txt' into table employee2 partition(age=30);

查看分区表信息

show partitions employee2;

多字段分区

create table employee3(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
partitioned by (age int , gender string)
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

插入数据

0: jdbc:hive2://192.168.95.150:10000> load data local inpath '/opt/employee.txt' into table employee3 partition(age=20,gender='0');
0: jdbc:hive2://192.168.95.150:10000> load data local inpath '/opt/employee.txt' into table employee3 partition(age=20,gender='1');

数据表

数据表分为内部表和外部表

内部表（管理表）

HDFS中为所属数据库目录下的子文件夹
数据完全由Hive管理，删除表(元数据)会删除数据

外部表（External Tables）

数据保存在指定位置的HDFS路径中
Hive不完全管理数据，删除表(元数据)不会删除数据

创建外部

create external table if not exists employee(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n
location '/tmp/hivedata/employee';