hive系列-hive教程

最新推荐文章于 2024-05-13 05:57:08 发布

boat824109722

最新推荐文章于 2024-05-13 05:57:08 发布

阅读量1.1k

点赞数

分类专栏： hive 大数据 hadoop 文章标签：大数据 hive

本文链接：https://blog.csdn.net/weixin_41279060/article/details/78814663

版权

大数据同时被 3 个专栏收录

32 篇文章 0 订阅

订阅专栏

hadoop

15 篇文章 0 订阅

订阅专栏

hive

5 篇文章 0 订阅

订阅专栏

 
 hive教程 

 
 备注：hive本身是不存储数据的，它可以把Mysql、Hdfs、Hbase等当然数据源，然后做数据处理，表面上，Hive用的是SQL，但是Hive的底层是MapReduce，它只是一个把SQL转变成MapReduce的工具，当然，有些没有查询条件的查询它只是做了数据的映射（并不执行MapReduce），这也是它做的优化。 

 
 1、建表 

 
 默认是内部表 

 
 create table trade_detail(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t'; 

 
 建分区表 

 
 create table td_part(id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by '\t'; 

 
 建外部表 

 
 create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/td_ext'; 

 
 2、普通表和分区表区别：有大量数据增加的需要建分区表 

 
 create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t';  

 
 3、内部表和外部表的区别 

 
 a、内部表一般是原来没有数据，需要自己加载数据，而且删除表时，数据会被一起删除 

 
 b、外部表一般是已经有了业务数据的，而且存在别的系统已经在使用这些数据，删除表时，数据不会被删除 

 
 4、加载数据 

 
 a、普通表数据加载 

 
 load data local inpath '/home/hadoop/ip.txt' into table tab_ext; 

 
 b、分区表数据加载 

 
 load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22'); 

 
 load data local inpath '/root/data.am' into table beauty partition (nation="USA"); 

 
 5、查询数据 

 
 a、分区表查询所有数据 

 
 select * from tab_ip_part; 

 
 b、分区表查询分区为part2的数据 

 
 select * from tab_ip_part where part_flag='part2'; 

 
 6、更新表 

 
 alter table tab_ip change id id_alter string; 

 
 alter table tab_cts add partition (partCol = 'dt') location '/external/hive/dt'; 

 
 show partitions tab_ip_part; 

 
 7、数组类型（array） 

 
 create table tab_array(a array<int>,b array<string>) 

 
 row format delimited 

 
 fields terminated by '\t' 

 
 collection items terminated by ','; 

 
 示例数据 

 
 tobenbrone,laihama,woshishui 13866987898,13287654321 

 
 abc,iloveyou,itcast 13866987898,13287654321 

 
 select a[0] from tab_array; 

 
 select * from tab_array where array_contains(b,'word'); 

 
 insert into table tab_array select array(0),array(name,ip) from tab_ext t;  

 
 8、map类型 

 
 create table tab_map(name string,info map<string,string>) 

 
 row format delimited 

 
 fields terminated by '\t' 

 
 collection items terminated by ';' 

 
 map keys terminated by ':'; 

 
 示例数据： 

 
 fengjie age:18;size:36A;addr:usa 

 
 furong age:28;size:39C;addr:beijing;weight:180KG 

 
 9、insert from select 用于向临时表中追加中间结果数据 

 
 create table tab_ip_like like tab_ip; 

 
 insert overwrite table tab_ip_like select * from tab_ip; 

 
 10、写入hdfs 

 
 insert overwrite local directory '/home/hadoop/hivetemp/test.txt' select * from tab_ip_part where part_flag='part1';  

 
 insert overwrite directory '/hiveout.txt' select * from tab_ip_part where part_flag='part1'; 

 
 11、CLUSTER（桶，均衡的把数据插入文件中），相对高级一点，你可以放在有精力的时候才去学习，用于做抽样分析等 

 
 create table tab_ip_cluster(id int,name string,ip string,country string) clustered by(id) into 3 buckets; 

 
 load data local inpath '/home/hadoop/ip.txt' overwrite into table tab_ip_cluster; 

 
 set hive.enforce.bucketing=true; 

 
 insert into table tab_ip_cluster select * from tab_ip; 

 
 select * from tab_ip_cluster tablesample(bucket 2 out of 3 on id);  

 
 12、自定义函数，UDF 

 
 a、java类要继承org.apache.hadoop.hive.ql.exec.UDF类实现evaluate 

 
 b、添加jar包（把jar包打包好上传到hive的机子上，在hive命令行里面执行） 

 
 hive> add jar /root/NUDF.jar; 

 
 c、创建临时函数 

 
 hive> create temporary function getNation as 'cn.itcast.hive.udf.NationUDF'; 

 
 d、调用 

 
 hive> select id, name, getNation(nation) from beauty; 

 
 注意： 

 
 1、hdfs的mapreduce如果是用hdfs用户来启动任务的，那么启动hive也要切换到hdfs，不然会出现权限的错误 

boat824109722

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
hive系列-hive教程

hive教程备注：hive本身是不存储数据的，它可以把Mysql、Hdfs、Hbase等当然数据源，然后做数据处理，表面上，Hive用的是SQL，但是Hive的底层是MapReduce，它只是一个把SQL转变成MapReduce的工具，当然，有些没有查询条件的查询它只是做了数据的映射（并不执行MapReduce），这也是它做的优化。1、建表默认是内部表creat
复制链接

扫一扫

专栏目录