hive1.2.2+hadoop2.7.3导入米骑测试日志以及数据优化(五）

最新推荐文章于 2022-06-26 00:06:58 发布

公众号【禅与大数据】，欢迎订阅

最新推荐文章于 2022-06-26 00:06:58 发布

阅读量810

点赞数

分类专栏： hive hadoop 文章标签： hive hadoop

本文链接：https://blog.csdn.net/cafebar123/article/details/74371463

版权

hadoop 同时被 2 个专栏收录

27 篇文章 0 订阅

订阅专栏

hive

4 篇文章 0 订阅

订阅专栏

本文介绍了如何使用Hive作为数据仓库进行数据管理的方法。主要内容包括Hive的基本操作、表的创建与优化，以及如何导入Hadoop处理后的数据到Hive中。通过实际案例展示了Hive在处理大规模数据集方面的强大能力。

摘要由CSDN通过智能技术生成

Hive是hadoop连接数据库的一个组件.是一个数据仓库,提供了Hadoop类sql 的增,删,改,查.

hive的表一般跟hdfs路径下的文件对应.hive 的常用命令如下:

启动:

./bin/hive shell

查看所有表:

show tables;

创建表:

create t_1(a int, b int, c int) row format delimited fields terminated by '\t';

修改表:

alter table t_1 add columns(d String);

导入数据:

load data local inpath '/testdata/words.txt' overwrite into table t_1;

导入hdfs中的文件:

load data inpath 'hdfs://master:9000/testdata/words.txt' overwrite into table t_1;

等等...

下面将米骑测试服务器访问日志统计出来的kpi等数据导入进hive的表中.

(1)统计米骑访问日志kpi程序下载链接:

http://download.csdn.net/detail/cafebar123/9889939

(2)创建hive表

先创建2个表,分别代表访问ip次数表:t_ip,访问的上一个跳转链接次数, t_remote_user

然后导入hadoop统计生成的数据,

load data inpath 'hdfs://master:9000/user/hadoop/ipCountOutput/part-r-00000' overwrite into table t_ip;

如图:

此时,t_ip实际上与ti_ip文件夹互相对应.t_remote_user的处理类似与以上.

(3)表的优化

1)下面试着分区表,并试着把米骑测试服务器的日志全部导入进表中.

重新创建一个表,并添加一个partition:

create table t_log(ip String,remote_user String,block1 String,local_time String,time_field String,tie_zone String,request_type String,request String,req_status String,resp_status int,body_bytes_sent Sttp_referer String,user_agent String,req_language String) partitioned by(req_month String) row formaited fields terminated by ' ';

共有13个字段,req_month为partition.

导入日志数据:

load data inpath 'hdfs://master:9000/user/hadoop/miqiLog10000Input/miqizuche10000.log' overwrite int table t_log partition(req_month=0709);

效果:

错误:

ValidationFailureSemanticException table is not partitioned but partition spec exists

这是没有该分区列导致的.如果在创建表时,没有创建与分区名一样的分区列,新增分区时,就会报这bug.

公众号【禅与大数据】，欢迎订阅

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录