Hive之表数据导入导出 ,表数据存储格式 ,表数据和表的注释(3)

最新推荐文章于 2024-01-10 11:46:34 发布

算啦粉

最新推荐文章于 2024-01-10 11:46:34 发布

阅读量930

点赞数 1

分类专栏： Hive 文章标签： hive 大数据

本文链接：https://blog.csdn.net/cch19930303/article/details/108478264

版权

Hive 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

一表数据的导入和导出

1 表数据的导入方式------hive 交互窗口上

1.1 load本地数据

load data local inpath "/root/hive/.txt或者 .log" into table tb_name;
load data local inpath "/root/hive/.txt" overwrite into table tb_name; // 加载并覆盖之前的数据

1.2 load hdfs shell客户端上的数据

load data inpath "/hive/(文件或文件夹的路径)" into table tb_name;

1.3 location 外部表(从 hdfs shell端上加载数据)

row format delimited fields terminated by "\t" location "/hive/" ;

1.4 insert into 不要在 hive 中 insert 数据/一条语句

1) 将查询到的结果保存到 tb_name 这个表中 ,但是前提是 : 需要先将 tb_name 这个表先创建好,然后该表的字段还要与查询出的结果的字段一一对应 .
insert into tb_name select * from tb_user;
2 ) insert into tb_name values ; 效率低,在 hdfs 上会出现大量小文件(不建议使用这种导入方式)

1.5 根据查询结果建立一个表用来保存查询结果 ,直接创建一个表 ,表的字段是根据查询结果自动一一对应的 ,比 insert into 实用

create table tb_name as select * from tb_user;

1.6 导入

import table tb_orders from "/doit17/export/orders";

2 表数据的导出

2.1 从 hive 交互窗口中将指定的表导出到 hdfs 上 ,在 hdfs web界面可以找到导出的文件

export table tb_orders to "/user/hive/export/order/data/";

2.2 将数据导出到本地的目录中-----运算的数据

insert overwrite local directory "/root/hive/data/" row format delimited fields terminated by "\t"
select id,name,cost+100 from tb_order;

2.3 hdfs dfs -get

2.4 dfs -get

2.5 hive -e "select from" >>res.log

二表数据存储格式

1 将数据存到本地 /root/hive/user.txt ,生成静态文件 .在 hive 交互窗口创建一张内部表 ,然后将生成的静态文件插入到刚刚创建的内部表中 ,查看文件的插入情况 .

1 数据源:
zss,18,M
lii,20,F
YUU,9,F
HOO,10,F
BII,25,m

2  将数据存到本地 /root/hive/user.txt文件中,生成静态文件,然后在 hive交互窗口建表
create table tb_user(
name string,
age int,
gender string)
row format delimited fields terminated by ",";

3  生成的静态文件插入到刚刚创建的表中
load data local inpath "/root/hive/user.txt" into table tb_user;

4  查询数据的插入情况------hive 交互窗口上查看
select * from tb_user;
+---------------+--------------+-----------------+
| tb_user.name  | tb_user.age  | tb_user.gender  |
+---------------+--------------+-----------------+
| zss           | 18           | M               |
| lii           | 20           | F               |
| YUU           | 9            | F               |
| HOO           | 10           | F               |
| BII           | 25           | m               |
+---------------+--------------+-----------------+

2 以 ORC 格式存储查询结果

2.1 创建一个用来存储查询结果的表 , 并指定其存储格式为 ORC 格式(表中的属性要与查询结果的字段属性一一对应)

1 在 hive 交互窗口上创建表格 ,用来存储查询结果
create table tb_store_orc(
name string,
age int,
gender string)
stored as orc;

2.2 将查询到的结果保存到刚刚创建的指定了 ORC 存储格式的表中 ,前提是表要提前创建好 ,表中的字段与查询的字段类型要一一对应

1 将查询到的结果保存到指定了存储格式的表中
insert into tb_store_orc 
select
* 
from
tb_user
order by age;

2 在 hive 交互窗口上查看查询结果插入情况 ,结果显示,查询的结果成功保存到指定了存储格式的表中 
select * from tb_store_orc;
+--------------------+-------------------+----------------------+
| tb_store_orc.name  | tb_store_orc.age  | tb_store_orc.gender  |
+--------------------+-------------------+----------------------+
| YUU                | 9                 | F                    |
| HOO                | 10                | F                    |
| zss                | 18                | M                    |
| lii                | 20                | F                    |
| BII                | 25                | m                    |
+--------------------+-------------------+----------------------+

2.3 将 hdfs web 端上hive 数据存储的路径 /user/hive/warehouse/tb_store_orc/000000_0 文件拷贝到 hdfs shell客户端查看,结果显示 , 以 ORC 格式存储时,在 hdfs shell客户端上查看是乱码的状态

[root@doit03 ~]# hdfs dfs -cat /user/hive/warehouse/tb_store_orc/000000_0

3 以 parquet 格式存储查询结果

3.1 先将用来储存查询结果的表创建好, 并在创建时指定存储的格式/数据类型是 parquet 格式

create table tb_store_parquet(
name string,
age int,
gender string)
stored as parquet;

3.2 将查询的结果保存到表中 ,前提是需要将表创建好,表中的指定与查询的字段类型要一一对应

1  将查询结果插入到指定了 parquet 存储格式的表中----hive交互窗口上------
insert into tb_store_parquet 
select
* 
from
tb_user
order by age;

2  查看查询结果存储情况----------------hive交互窗口上查看---
select * from tb_store_parquet; 
+------------------------+-----------------------+--------------------------+
| tb_store_parquet.name  | tb_store_parquet.age  | tb_store_parquet.gender  |
+------------------------+-----------------------+--------------------------+
| YUU                    | 9                     | F                        |
| HOO                    | 10                    | F                        |
| zss                    | 18                    | M                        |
| lii                    | 20                    | F                        |
| BII                    | 25                    | m                        |
+------------------------+-----------------------+--------------------------+

3.3 将保存在 hdfs web上 /user/hive/warehouse/tb_store_parquet/000000_0 文件放到 hdfs shell客户端查看 ,结果显示 ,也是一堆乱码

1  在 hdfs shell客户端上查看存储在 parquet 存储格式表中的查询结果
[root@linux03 ~]# hdfs dfs -cat /user/hive/warehouse/tb_store_parquet/000000_0
PAR1RR,
6(zssBIIYUUHOOzssliiBII44,
  (
LFMm,
6(mFLH
      hive_schema
                 %name%%age
                           %gender%
<&
 name
<6(zssBII&age
&      (       &gender
Asia/ShanghaiJparquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a)<zPAR1

4 不指定存储类型,默认格式是 txt 格式存储查询结果

4.1 在 hive 交互窗口上创建一个表 ,用来存储查询结果 ,创建时不指定存储格式类型 ,表数据目录下的文件存储类型没有指定的话 ,是默认的 txt 类型 ,

1 创建用来存储查询结果的表 ,不指定存储格式 ,默认是 txt 格式
create table tb_store(
id int,
ctime string,
event string,
sessionid string)
;

2  将查询结果插入到刚刚创建的表格中
insert into tb_store 
select
* 
from
tb_user
order by age;

3  查看查询结果的存储情况
select * from tb_store;
+------------------------+-----------------------+--------------------------+
| tb_store.name  | tb_store.age  | tb_store.gender  |
+------------------------+-----------------------+--------------------------+
| YUU                    | 9                     | F                        |
| HOO                    | 10                    | F                        |
| zss                    | 18                    | M                        |
| lii                    | 20                    | F                        |
| BII                    | 25                    | m                        |
+------------------------+-----------------------+--------------------------+

4.2 将保存在 hdfs web上 /user/hive/warehouse/tb_store/000000_0 文件放到 hdfs shell客户端查看 ,结果显示 ,也是一堆乱码

在hdfs shell上查看tb_store表在 warehouse 上对应表目录下存储的对应的文件,结果显示数据正常
[root@doit03 ~]# hdfs dfs -cat /user/hive/warehouse/tb_store/000000_0
YUU,9,F
HOO,10,F
zss,18,M
lii,20,F
BII,25,m

5 表查询结果存储格式总结

5.1 列式存储格式有 : ORC parquet ,列式存储的优势是 : 压缩比例高,存储比例最高

5.2 查询结果以 insert into tb_name select * from tb_user; 的方式保存结果的前提是 : 需要提前将tb_name 这个表创建好,并且表字段类型要与查询结果一一对应 .

5.3 查询结果以 ORC 格式存储时 ,数据可以在 hive 交互端查看,当时将存储在 hdfs web 上/user/hive/warehouse 目录下对应表目录下的文件拷贝到 hdfs shell客户端查看 ,就只能看到一堆乱码, 保证了数据的安全.

ORC 的列式存储格式

优点是: 压缩比例是最高的

缺点是 : 解析时速度比较慢,效率较低

5.4 查询结果以 parquet 格式存储时 ,数据可以在 hive 交互端查看,当时将存储在 hdfs web 上/user/hive/warehouse 目录下对应表目录下的文件拷贝到 hdfs shell客户端查看 ,也是一堆乱码 ,这样保证了数据的安全性 .

parquet 的列式存储格式优点是 : 压缩比例比 ORC 稍低,但是比 txt 格式高.而且解析速度比 ORC 要高,所以是最常用的列式存储格式 .

5.5 如果查询结果存储时没有指定存储格式 ,那默认的存储格式是 txt 格式 ,在 hive 交互界面可以查看数据,在 hdfs web 上/user/hive/warehouse 目录下对应表目录下的文件拷贝到 hdfs shell客户端查看,也可以看到文件内容 ,所以安全性不高.

三表数据的注释和表的注释

1 表数据的注释

1 因为不知道创建的表存不存在 ,所有创建表之前先删除表再创建----hive 交互窗口上--------
drop table tb_user1;
create table tb_user1(
name string comment "名字",
age int,
gender string)comment "名字"
row format delimited fields terminated by ",";
load data local inpath "/root/hive/user.txt" into table tb_user1;

2 查看字段注释情况-----------hive 交互窗口上查看-------------
desc tb_user1;
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
| name      | string     | ??       |---中文不识别,是乱码问题,需要进行hive中文乱码修改
| age       | int        |          |
| gender    | string     |          |
+-----------+------------+----------+

3  删除表重新创建 ,将注释改为英文
drop table tb_user1;
create table tb_user1(
name string comment "username",
age int,
gender string)comment "username"
row format delimited fields terminated by ",";
load data local inpath "/root/hive/user.txt" into table tb_user1;

4  再次查看字段注释情况
desc tb_user1;
+-----------+------------+-----------+
| col_name  | data_type  |  comment  |
+-----------+------------+-----------+
| name      | string     | username  |------注释字段
| age       | int        |           |
| gender    | string     |           |
+-----------+------------+-----------+

2 表的注释

drop table tb_user2;
create table tb_user2(
name string,
age int,
gender string)comment "示例表"
row format delimited fields terminated by ",";
load data local inpath "/root/hive/user.txt" into table tb_user2;

算啦粉

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive之表数据导入导出 ,表数据存储格式 ,表数据和表的注释(3)

一表数据的导入和导出1 表数据的导入方式------hive交互窗口上1.1 load本地数据load data local inpath "/root/hive/.txt或者 .log" into table tb_name;load data local inpath "/root/hive/.txt" overwrite into table tb_name; // 加载并覆盖之前的数据1.2 load hdfs shell客户端上的数据load data...
复制链接

扫一扫