hive详解（操作）

最新推荐文章于 2024-03-26 14:37:37 发布

brz_em

最新推荐文章于 2024-03-26 14:37:37 发布

阅读量804

点赞数

分类专栏：云计算、大数据文章标签： hive操作

本文链接：https://blog.csdn.net/qq_35180983/article/details/82902685

版权

云计算、大数据专栏收录该内容

22 篇文章 1 订阅

订阅专栏

hive操作（补充）

创建表的本质：
在hdfs的/user/hive/warehouse下的对应的库目录下创建表目录
删除表的本质：
删除表数据对应的目录

加载数据：
（1）values（不建议使用，耗时太长）

insert into t_2 values('1','zhangsan');

我们可以看到，这个时间（虽然与我的集群运行速度有关），但实在是太慢了。
（2）put操作（即上传文件）

hdfs dfs -put /xxx /user/hive/warehouse/brz.db/t_3

这里我上传stu文件

[root@hadoop01 test]# hdfs dfs -put ./stu /user/hive/warehouse/brz.db/t_2
[root@hadoop01 test]# hdfs dfs -cat /user/hive/warehouse/brz.db/t_2/stu
1liming
2daming

可以看到，上传成功。hive中查看：

注意：hive的分隔符默认为：(ctrl+v ctrl+A)，hive是严格的读时模式，如果格式不正确，就会用MULL代替
（3）load方式

load data [local] inpath '/usr/local/hive/xxxx' into table tableName;

**注意：**加local则为linux下的目录

小技巧：在hive的客户端中执行hdfs和linux的shell命令，需要在命令之前加上!

！hdfs dfs -ls /

加载数据的本质：
将数据文件copy（不完全是copy）到对应表目录下。
如果数据是从本机中加载的，则copy数据到表目录下；
如果数据是从hdfs中加载，则移动(剪切)数据到表目录下。

（4）insert into加载数据

insert into t_4
select * from t_2
where uid < 7
;

克隆表，不带数据：like

create table if not exists t_5 like t_4;

克隆表带数据：

create table if not exists t_6 like t_2 location '/user/hive/warehouse/brz.db/t_2';

注意：
location后指定的一定是hdfs的目录，而不是文件

克隆表带数据：

更灵活的方式
跟创建表的方式一样，元数据和目录都会创建

create table if not exists t_7
as
select * from t_2
where uid < 3;

设置hive执行的本机模式：

set hive.exec.mode.local.auto=true;

create table if not exists t_8
as
select uname from t_2
where 1=0
;

速度明显快了很多。

查看库描述：

desc database [extended] brz;
describe database [extended] brz;

查看表：

desc [extended] t_8;
describe [extended] t_2;
//加extended显示较为详细	
show create table t_2;//显示的结果较全

案例：

CREATE TABLE log(
id string COMMENT 'this is id column',
phonenumber bigint,
mac string,
ip string,
url string,
stat01 string,
stat02 string,
upflow int,
downflow int,
status string,
dt string
)
COMMENT 'this is log table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '	'
LINES TERMINATED BY '\n'
stored as textfile;

加载数据：

load data local inpath '/usr/local/hive/test/data.log.txt' into table log;

需求：
1、统计每个用户的上下行流量以及总流量（用人眼可识别的流量单位表示，保留2位小数）

select
l.phonenumber,
sum(l.upflow) as upflow,
sum(l.downflow) as downflow,
sum(l.upflow + l.downflow) as sumflow
from log l
group by l.phonenumber
;

2、求访问排名前三的url：

select
l.url,
count(l.url) as urlcount 
from log l
group by l.url
order by urlcount desc 
limit 3
;

3、模拟收费(总流量*价格)

表的修改：
1、修改表名 rename to

alter table t_2 rename to t_user_info;

2、修改列名：change column

alter table t_9 change column uname name string;

3、修改列的位置：

alter table t_9 change column name name string after uage;
alter table t_9 change column uage uage string after uname;

alter table t_9 change column uage uage string first;

4、修改字段类型

alter table t_9 change column uid uid string;

5、增加字段 add columns

alter table t_9 add columns (
usex int,
addr string
)
;

6、删除字段：replace columns(本质：先删除表，再创建表)

alter table t_9 replace columns(
uid string,
uname string,
addr string
)
;

7、内部表和外部表的转换：

alter table t_9 set tblproperties("EXTERNAL"="TRUE");   ##true一定要大写
alter table t_9 set tblproperties("EXTERNAL"="false");  ##false大小写都没关系

显示当前库：

set hive.cli.print.current.db=true;

删除库：

drop database if exists gp;  ##删除空库
drop database if exists test cascade; ##cascade强制删除

brz_em

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
hive详解（操作）

hive操作创建表的本质：在hdfs的/user/hive/warehouse下的对应的库目录下创建表目录删除表的本质：删除表数据对应的目录加载数据：（1）values（不建议使用，耗时太长）insert into t_2 values('1','zhangsan');我们可以看到，这个时间（虽然与我的集群运行速度有关），但实在是太慢了。（2）put操作（即上传文件）hdf...
复制链接

扫一扫